[MDEV-10341] InnoDB: Failing assertion: mutex_own(mutex) - mutex_exit_func - multiple MTR tests Created: 2016-07-07  Updated: 2016-09-20  Resolved: 2016-09-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1.16, 10.0.27, 10.2.2
Fix Version/s: 5.5.51, 10.1.18, 10.0.27, 10.2.2

Type: Bug Priority: Blocker
Reporter: Daniel Black Assignee: Vicențiu Ciorbaru
Resolution: Fixed Votes: 0
Labels: powerpc
Environment:

ubuntu 16.04 ppc64le


Attachments: File 10.0.27.log     File 10.1.16.log     File build.log.bz2     File mtr.log    
Issue Links:
Relates
relates to MDEV-7026 Race in InnoDB/XtraDB mutex implement... Closed
Sprint: 10.2.2-1, 10.2.2-2, 10.2.2-3, 5.5.51 & 10.2.2, 10.1.17-1

 Description   

10.2 at commit 848d211c5c4df00b819cd84d7530cf7d29bb0524

build.log is the general build steps.

mtr.log is the exact from these gdb traces:

gdb /build/build/sql/mysqld /dev/shm/var_auto_hNUV/log/rpl.rpl_semi_sync_event-innodb_plugin,mix/mysqld.1/data/core
 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `/build/build/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/build/build'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fff7dfc2e4c in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
58      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x3fff6d3fefb0 (LWP 26338))]
(gdb) bt
#0  0x00003fff7dfc2e4c in __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
#1  0x000000003b7a3664 in my_write_core (sig=<optimized out>) at /source/mysys/stacktrace.c:456
#2  0x000000003b0bc71c in handle_fatal_signal (sig=<optimized out>) at /source/sql/signal_handler.cc:272
#3  <signal handler called>
#4  0x00003fff7da4f27c in __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00003fff7da518f4 in __GI_abort () at abort.c:89
#6  0x00003fff762b6bac in mutex_exit_func (mutex=0x3fff763f54b8 <sync_thread_mutex>) at /source/storage/innobase/include/sync0sync.ic:160
#7  pfs_mutex_exit_func (mutex=0x3fff763f54b8 <sync_thread_mutex>) at /source/storage/innobase/include/sync0sync.ic:314
#8  sync_thread_reset_level (latch=0x3fff7cff7680) at /source/storage/innobase/sync/sync0sync.cc:1388
#9  0x00003fff761f5960 in mutex_exit_func (mutex=<optimized out>) at /source/storage/innobase/include/sync0sync.ic:165
#10 pfs_mutex_exit_func (mutex=<optimized out>) at /source/storage/innobase/include/sync0sync.ic:314
#11 que_run_threads_low (thr=0x3fff7ce59ac0) at /source/storage/innobase/que/que0que.cc:1164
#12 que_run_threads (thr=0x3fff7ce59ac0) at /source/storage/innobase/que/que0que.cc:1182
#13 0x00003fff762c44a8 in trx_purge (n_purge_threads=<optimized out>, batch_size=300, truncate=<optimized out>)
    at /source/storage/innobase/trx/trx0purge.cc:1246
#14 0x00003fff762a5a28 in srv_do_purge (n_total_purged=<synthetic pointer>, n_threads=1) at /source/storage/innobase/srv/srv0srv.cc:2803
#15 srv_purge_coordinator_thread (arg=<optimized out>) at /source/storage/innobase/srv/srv0srv.cc:2978
#16 0x00003fff7dfb84a0 in start_thread (arg=0x3fff6d3fefb0) at pthread_create.c:335
#17 0x00003fff7db37e74 in clone () at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

rpl.rpl_gtid_crash-innodb_plugin,mix/mysqld.2/data/core
 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
Core was generated by `/build/build/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/build/build'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00003fffac312e4c in __pthread_kill (threadid=<optimized out>,
    signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
58      ../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x3fff9f38efb0 (LWP 25472))]
(gdb) bt
#0  0x00003fffac312e4c in __pthread_kill (threadid=<optimized out>,
    signo=<optimized out>) at ../sysdeps/unix/sysv/linux/pthread_kill.c:58
#1  0x0000000045dc3664 in my_write_core (sig=<optimized out>)
    at /source/mysys/stacktrace.c:456
#2  0x00000000456dc71c in handle_fatal_signal (sig=<optimized out>)
    at /source/sql/signal_handler.cc:272
#3  <signal handler called>
#4  0x00003fffabd9f27c in __GI_raise (sig=<optimized out>)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#5  0x00003fffabda18f4 in __GI_abort () at abort.c:89
#6  0x00003fffa46b6214 in mutex_exit_func (
    mutex=0x3fffa47f54b8 <sync_thread_mutex>)
    at /source/storage/innobase/include/sync0sync.ic:160
#7  pfs_mutex_exit_func (mutex=0x3fffa47f54b8 <sync_thread_mutex>)
    at /source/storage/innobase/include/sync0sync.ic:314
#8  sync_thread_add_level (latch=0x3fffab314078, level=140, relock=0)
    at /source/storage/innobase/sync/sync0sync.cc:1336
#9  0x00003fffa46b6284 in mutex_set_debug_info (mutex=0x3fffab314078,
    file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
    line=712) at /source/storage/innobase/sync/sync0sync.cc:639
#10 0x00003fffa4430818 in mutex_enter_func (line=712,
    file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
    mutex=0x3fffab314078) at /source/storage/innobase/include/sync0sync.ic:215
---Type <return> to continue, or q <return> to quit---
#11 pfs_mutex_enter_func (line=712,
    file_name=0x3fffa4711838 "/source/storage/innobase/buf/buf0dblwr.cc",
    mutex=0x3fffab314078) at /source/storage/innobase/include/sync0sync.ic:251
#12 buf_dblwr_update (bpage=<optimized out>, flush_type=<optimized out>)
    at /source/storage/innobase/buf/buf0dblwr.cc:712
#13 0x00003fffa4443880 in buf_flush_write_complete (bpage=0x3fffa1bb2668)
    at /source/storage/innobase/buf/buf0flu.cc:695
#14 0x00003fffa4429144 in buf_page_io_complete (bpage=0x3fffa1bb2668,
    evict=<optimized out>) at /source/storage/innobase/buf/buf0buf.cc:4769
#15 0x00003fffa44b6c1c in fil_aio_wait (segment=5)
    at /source/storage/innobase/fil/fil0fil.cc:6059
#16 0x00003fffa46a826c in io_handler_thread (arg=<optimized out>)
    at /source/storage/innobase/srv/srv0start.cc:515
#17 0x00003fffac3084a0 in start_thread (arg=0x3fff9f38efb0)
    at pthread_create.c:335
#18 0x00003fffabe87e74 in clone ()
    at ../sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S:96

This seems to be quite repeatable.



 Comments   
Comment by Elena Stepanova [ 2016-07-07 ]

jplindst, cvicentiu,

Can it be related to the other ppc64el problem that you've been investigating?

Comment by Vicențiu Ciorbaru [ 2016-07-07 ]

elenst So far it doesn't look related. It is something related to locking but I'm not seeing the same crashes here.

Comment by Vicențiu Ciorbaru [ 2016-07-08 ]

elenst After a bit of digging, our other ppc64el problem seems to be related to these changes:

diff --git a/storage/innobase/include/os0sync.h b/storage/innobase/include/os0sync.h
index 1cf4e9c..95e724e 100644
--- a/storage/innobase/include/os0sync.h
+++ b/storage/innobase/include/os0sync.h
@@ -466,7 +466,7 @@ amount to decrement. */
 # define os_atomic_decrement_uint64(ptr, amount) \
 	os_atomic_decrement(ptr, amount)
 
+# if defined(HAVE_ATOMIC_BUILTINS)
-# if defined(IB_STRONG_MEMORY_MODEL)
 
 /** Do an atomic test and set.
 @param[in,out]	ptr		Memory location to set to non-zero
diff --git a/storage/xtradb/include/os0sync.h b/storage/xtradb/include/os0sync.h
index 0f93f3f..dd808e8 100644
--- a/storage/xtradb/include/os0sync.h
+++ b/storage/xtradb/include/os0sync.h
@@ -517,7 +517,7 @@ amount to decrement. */
 # define os_atomic_decrement_uint64(ptr, amount) \
 	os_atomic_decrement(ptr, amount)
 
+# if defined(HAVE_ATOMIC_BUILTINS)
-# if defined(IB_STRONG_MEMORY_MODEL)
 
 /** Do an atomic test and set.
 @param[in,out]	ptr		Memory location to set to non-zero

I doubt they are the cause for this problem, but you can try a reverse patch of this (if it got pulled into 10.2) and see if it fixes the crash.

Comment by Daniel Black [ 2016-07-08 ]

Ok. Will try that patch. I'm also seeing if this is a gcc-6.1.1 issue because I got the same assert on the latest 10.0 branch.

Comment by Daniel Black [ 2016-07-08 ]

Tested gcc-5.3.1 and had the same issue.

Tried with the reverse patch of the above and it didn't cause the assertion.

Comment by Jan Lindström (Inactive) [ 2016-07-08 ]

Above patch was done to fix https://bugs.mysql.com/bug.php?id=79185 in 5.5 so blind revert is not correct.

Comment by Jan Lindström (Inactive) [ 2016-07-11 ]

svoj Analysis:

It came to be even more interesting.

In effect now we have MDEV-7026 back plus incorrect fix for "Z" compilation
that caused another memory barrier issue (something that Vicențiu has
discovered).

I acknowledge that by reverting patch in checkpoint 5 you fixed one of these
issues. Another one (MDEV-7026) is not less critical, but is a bit more complex
to reproduce and it is still around.

Now how to solve this: I'd suggest to rollback changes we got from MySQL, it
will also naturally rollback my patch for the "Z" compilation failure.

How to avoid this in the future? Reimplement this mess and don't take changes
for primitives from MySQL until they hire somebody with clear understanding of
memory models, atomic operations and sync primitives.

Comment by Vicențiu Ciorbaru [ 2016-08-01 ]

Hi Sergey!

I've taken your proposed patch for debian and have changed it to have it compile on Windows as well.

Can you please take a look?

http://lists.askmonty.org/pipermail/commits/2016-August/009582.html

Generated at Thu Feb 08 07:41:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.