Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10341

InnoDB: Failing assertion: mutex_own(mutex) - mutex_exit_func - multiple MTR tests

Details

    • 10.2.2-1, 10.2.2-2, 10.2.2-3, 5.5.51 & 10.2.2, 10.1.17-1

    Description

      10.2 at commit 848d211c5c4df00b819cd84d7530cf7d29bb0524

      build.log is the general build steps.

      mtr.log is the exact from these gdb traces:

      gdb /build/build/sql/mysqld /dev/shm/var_auto_hNUV/log/rpl.rpl_semi_sync_event-innodb_plugin,mix/mysqld.1/data/core
       
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
      Core was generated by `/build/build/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/build/build'.
      Program terminated with signal SIGABRT, Aborted.
      #0  0x00003fff7dfc2e4c in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
      58      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
      [Current thread is 1 (Thread 0x3fff6d3fefb0 (LWP 26338))]
      (gdb) bt
      #0  0x00003fff7dfc2e4c in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
      #1  0x000000003b7a3664 in my_write_core (sig=<optimized out>) at /source/mysys/stacktrace.c:456
      #2  0x000000003b0bc71c in handle_fatal_signal (sig=<optimized out>) at /source/sql/signal_handler.cc:272
      #3  <signal handler called>
      #4  0x00003fff7da4f27c in __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:54
      #5  0x00003fff7da518f4 in __GI_abort () at abort.c:89
      #6  0x00003fff762b6bac in mutex_exit_func (mutex=0x3fff763f54b8 <sync_thread_mutex>) at /source/storage/innobase/include/sync0sync.ic:160
      #7  pfs_mutex_exit_func (mutex=0x3fff763f54b8 <sync_thread_mutex>) at /source/storage/innobase/include/sync0sync.ic:314
      #8  sync_thread_reset_level (latch=0x3fff7cff7680) at /source/storage/innobase/sync/sync0sync.cc:1388
      #9  0x00003fff761f5960 in mutex_exit_func (mutex=<optimized out>) at /source/storage/innobase/include/sync0sync.ic:165
      #10 pfs_mutex_exit_func (mutex=<optimized out>) at /source/storage/innobase/include/sync0sync.ic:314
      #11 que_run_threads_low (thr=0x3fff7ce59ac0) at /source/storage/innobase/que/que0que.cc:1164
      #12 que_run_threads (thr=0x3fff7ce59ac0) at /source/storage/innobase/que/que0que.cc:1182
      #13 0x00003fff762c44a8 in trx_purge (n_purge_threads=<optimized out>, batch_size=300, truncate=<optimized out>)
          at /source/storage/innobase/trx/trx0purge.cc:1246
      #14 0x00003fff762a5a28 in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=1) at /source/storage/innobase/srv/srv0srv.cc:2803
      #15 srv_purge_coordinator_thread (arg=<optimized out>) at /source/storage/innobase/srv/srv0srv.cc:2978
      #16 0x00003fff7dfb84a0 in start_thread (arg=0x3fff6d3fefb0) at pthread_create.c:335
      #17 0x00003fff7db37e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
      

      rpl.rpl_gtid_crash-innodb_plugin,mix/mysqld.2/data/core
       
      [Thread debugging using libthread_db enabled]
      Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
      Core was generated by `/build/build/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/build/build'.
      Program terminated with signal SIGABRT, Aborted.
      #0  0x00003fffac312e4c in __pthread_kill (threadid=<optimized out>,
          signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
      58      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
      [Current thread is 1 (Thread 0x3fff9f38efb0 (LWP 25472))]
      (gdb) bt
      #0  0x00003fffac312e4c in __pthread_kill (threadid=<optimized out>,
          signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
      #1  0x0000000045dc3664 in my_write_core (sig=<optimized out>)
          at /source/mysys/stacktrace.c:456
      #2  0x00000000456dc71c in handle_fatal_signal (sig=<optimized out>)
          at /source/sql/signal_handler.cc:272
      #3  <signal handler called>
      #4  0x00003fffabd9f27c in __GI_raise (sig=<optimized out>)
          at ../sysdeps/unix/sysv/linux/raise.c:54
      #5  0x00003fffabda18f4 in __GI_abort () at abort.c:89
      #6  0x00003fffa46b6214 in mutex_exit_func (
          mutex=0x3fffa47f54b8 <sync_thread_mutex>)
          at /source/storage/innobase/include/sync0sync.ic:160
      #7  pfs_mutex_exit_func (mutex=0x3fffa47f54b8 <sync_thread_mutex>)
          at /source/storage/innobase/include/sync0sync.ic:314
      #8  sync_thread_add_level (latch=0x3fffab314078, level=140, relock=0)
          at /source/storage/innobase/sync/sync0sync.cc:1336
      #9  0x00003fffa46b6284 in mutex_set_debug_info (mutex=0x3fffab314078,
          file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
          line=712) at /source/storage/innobase/sync/sync0sync.cc:639
      #10 0x00003fffa4430818 in mutex_enter_func (line=712,
          file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
          mutex=0x3fffab314078) at /source/storage/innobase/include/sync0sync.ic:215
      ---Type <return> to continue, or q <return> to quit---
      #11 pfs_mutex_enter_func (line=712,
          file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
          mutex=0x3fffab314078) at /source/storage/innobase/include/sync0sync.ic:251
      #12 buf_dblwr_update (bpage=<optimized out>, flush_type=<optimized out>)
          at /source/storage/innobase/buf/buf0dblwr.cc:712
      #13 0x00003fffa4443880 in buf_flush_write_complete (bpage=0x3fffa1bb2668)
          at /source/storage/innobase/buf/buf0flu.cc:695
      #14 0x00003fffa4429144 in buf_page_io_complete (bpage=0x3fffa1bb2668,
          evict=<optimized out>) at /source/storage/innobase/buf/buf0buf.cc:4769
      #15 0x00003fffa44b6c1c in fil_aio_wait (segment=5)
          at /source/storage/innobase/fil/fil0fil.cc:6059
      #16 0x00003fffa46a826c in io_handler_thread (arg=<optimized out>)
          at /source/storage/innobase/srv/srv0start.cc:515
      #17 0x00003fffac3084a0 in start_thread (arg=0x3fff9f38efb0)
          at pthread_create.c:335
      #18 0x00003fffabe87e74 in clone ()
          at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96
      

      This seems to be quite repeatable.

      Attachments

        1. 10.0.27.log
          6 kB
          Daniel Black
        2. 10.1.16.log
          7 kB
          Daniel Black
        3. build.log.bz2
          299 kB
          Daniel Black
        4. mtr.log
          127 kB
          Daniel Black

        Issue Links

          Activity

            jplindst, cvicentiu,

            Can it be related to the other ppc64el problem that you've been investigating?

            elenst Elena Stepanova added a comment - jplindst , cvicentiu , Can it be related to the other ppc64el problem that you've been investigating?

            elenst So far it doesn't look related. It is something related to locking but I'm not seeing the same crashes here.

            cvicentiu Vicențiu Ciorbaru added a comment - elenst So far it doesn't look related. It is something related to locking but I'm not seeing the same crashes here.

            elenst After a bit of digging, our other ppc64el problem seems to be related to these changes:

            diff --git a/storage/innobase/include/os0sync.h b/storage/innobase/include/os0sync.h
            index 1cf4e9c..95e724e 100644
            --- a/storage/innobase/include/os0sync.h
            +++ b/storage/innobase/include/os0sync.h
            @@ -466,7 +466,7 @@ amount to decrement. */
             # define os_atomic_decrement_uint64(ptr, amount) \
             	os_atomic_decrement(ptr, amount)
             
            +# if defined(HAVE_ATOMIC_BUILTINS)
            -# if defined(IB_STRONG_MEMORY_MODEL)
             
             /** Do an atomic test and set.
             @param[in,out]	ptr		Memory location to set to non-zero
            diff --git a/storage/xtradb/include/os0sync.h b/storage/xtradb/include/os0sync.h
            index 0f93f3f..dd808e8 100644
            --- a/storage/xtradb/include/os0sync.h
            +++ b/storage/xtradb/include/os0sync.h
            @@ -517,7 +517,7 @@ amount to decrement. */
             # define os_atomic_decrement_uint64(ptr, amount) \
             	os_atomic_decrement(ptr, amount)
             
            +# if defined(HAVE_ATOMIC_BUILTINS)
            -# if defined(IB_STRONG_MEMORY_MODEL)
             
             /** Do an atomic test and set.
             @param[in,out]	ptr		Memory location to set to non-zero
            

            I doubt they are the cause for this problem, but you can try a reverse patch of this (if it got pulled into 10.2) and see if it fixes the crash.

            cvicentiu Vicențiu Ciorbaru added a comment - elenst After a bit of digging, our other ppc64el problem seems to be related to these changes: diff --git a/storage/innobase/include/os0sync.h b/storage/innobase/include/os0sync.h index 1cf4e9c..95e724e 100644 --- a/storage/innobase/include/os0sync.h +++ b/storage/innobase/include/os0sync.h @@ -466,7 +466,7 @@ amount to decrement. */ # define os_atomic_decrement_uint64(ptr, amount) \ os_atomic_decrement(ptr, amount) +# if defined(HAVE_ATOMIC_BUILTINS) -# if defined(IB_STRONG_MEMORY_MODEL) /** Do an atomic test and set. @param[in,out] ptr Memory location to set to non-zero diff --git a/storage/xtradb/include/os0sync.h b/storage/xtradb/include/os0sync.h index 0f93f3f..dd808e8 100644 --- a/storage/xtradb/include/os0sync.h +++ b/storage/xtradb/include/os0sync.h @@ -517,7 +517,7 @@ amount to decrement. */ # define os_atomic_decrement_uint64(ptr, amount) \ os_atomic_decrement(ptr, amount) +# if defined(HAVE_ATOMIC_BUILTINS) -# if defined(IB_STRONG_MEMORY_MODEL) /** Do an atomic test and set. @param[in,out] ptr Memory location to set to non-zero I doubt they are the cause for this problem, but you can try a reverse patch of this (if it got pulled into 10.2) and see if it fixes the crash.
            danblack Daniel Black added a comment -

            Ok. Will try that patch. I'm also seeing if this is a gcc-6.1.1 issue because I got the same assert on the latest 10.0 branch.

            danblack Daniel Black added a comment - Ok. Will try that patch. I'm also seeing if this is a gcc-6.1.1 issue because I got the same assert on the latest 10.0 branch.
            danblack Daniel Black added a comment -

            Tested gcc-5.3.1 and had the same issue.

            Tried with the reverse patch of the above and it didn't cause the assertion.

            danblack Daniel Black added a comment - Tested gcc-5.3.1 and had the same issue. Tried with the reverse patch of the above and it didn't cause the assertion.

            Above patch was done to fix https://bugs.mysql.com/bug.php?id=79185 in 5.5 so blind revert is not correct.

            jplindst Jan Lindström (Inactive) added a comment - Above patch was done to fix https://bugs.mysql.com/bug.php?id=79185 in 5.5 so blind revert is not correct.

            svoj Analysis:

            It came to be even more interesting.

            In effect now we have MDEV-7026 back plus incorrect fix for "Z" compilation
            that caused another memory barrier issue (something that Vicențiu has
            discovered).

            I acknowledge that by reverting patch in checkpoint 5 you fixed one of these
            issues. Another one (MDEV-7026) is not less critical, but is a bit more complex
            to reproduce and it is still around.

            Now how to solve this: I'd suggest to rollback changes we got from MySQL, it
            will also naturally rollback my patch for the "Z" compilation failure.

            How to avoid this in the future? Reimplement this mess and don't take changes
            for primitives from MySQL until they hire somebody with clear understanding of
            memory models, atomic operations and sync primitives.

            jplindst Jan Lindström (Inactive) added a comment - svoj Analysis: It came to be even more interesting. Checkpoint 1: we have functional InnoDB synchronization primitives, MySQL primitives don't work with relaxed memory model. Checkpoint 2: MySQL makes silly attempt to fix primitives to work with relaxed memory model: https://github.com/MariaDB/server/commit/f59d68eeae37338d7b25f2571407e763fa897e15 We merge it and it breaks compilation on "Z" and reintroduces MDEV-7026 (probably creates even more bugs). Checkpoint 3: we fix compilation failure on "Z": https://github.com/MariaDB/server/commit/f31a89191f6b1679d9f8b584aa95fe006182ee7d And the fix was alright (didn't have side effects). Checkpoint 4: MySQL realizes that their fix for relaxed memory model (checkpoint 2) broke even strong memory model (because they don't have fix for MDEV-7026 ). They create even more silly workaround for that: https://github.com/MariaDB/server/commit/93e6f388860490a6066cc07253a6aab94029e502 This breaks again compilation on "Z". Checkpoint 5: Sergei applies my patch (as in checkpoint 3), but my patch is not applicable anymore because of rearrangements done in checkpoint 4: https://github.com/MariaDB/server/commit/02043d7c53857f598a3183c7607587bfdc46ed74 In effect now we have MDEV-7026 back plus incorrect fix for "Z" compilation that caused another memory barrier issue (something that Vicențiu has discovered). I acknowledge that by reverting patch in checkpoint 5 you fixed one of these issues. Another one ( MDEV-7026 ) is not less critical, but is a bit more complex to reproduce and it is still around. Now how to solve this: I'd suggest to rollback changes we got from MySQL, it will also naturally rollback my patch for the "Z" compilation failure. How to avoid this in the future? Reimplement this mess and don't take changes for primitives from MySQL until they hire somebody with clear understanding of memory models, atomic operations and sync primitives.

            Hi Sergey!

            I've taken your proposed patch for debian and have changed it to have it compile on Windows as well.

            Can you please take a look?

            http://lists.askmonty.org/pipermail/commits/2016-August/009582.html

            cvicentiu Vicențiu Ciorbaru added a comment - Hi Sergey! I've taken your proposed patch for debian and have changed it to have it compile on Windows as well. Can you please take a look? http://lists.askmonty.org/pipermail/commits/2016-August/009582.html

            People

              cvicentiu Vicențiu Ciorbaru
              danblack Daniel Black
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.