[MDEV-6515] crash due to assertion on power8 Created: 2014-08-01  Updated: 2014-11-20  Resolved: 2014-09-05

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.13
Fix Version/s: 5.5.40, 10.0.14

Type: Bug Priority: Major
Reporter: Axel Schwenke Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None
Environment:

power8, RH6.5


Issue Links:
PartOf
is part of MDEV-6478 MariaDB on Power8 Closed
is part of MDEV-6530 Examine and apply Power8 patches sugg... Closed
Relates
relates to MDEV-7148 Recurring: InnoDB: Failing assertion:... Closed

 Description   

NB: Fix for this bug also present in Stewart Smith' patchset: memory_barrier-experimental_5.6.4.diff.

From errorlog:

2014-07-31 21:02:00 ff6fb757190  InnoDB: Assertion failure in thread 17553455149456 in file sync0rw.cc line 690
InnoDB: Failing assertion: !lock->recursive
InnoDB: We intentionally generate a memory trap.
...
stack_bottom = 0xff6fb756610 thread_stack 0x48000
:0(000000ca.plt_call.MD5_Init)[0x109b476c]
:0(000000ca.plt_call.MD5_Init)[0x103d7180]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0xfff8fa30448]
/opt/at7.0/lib64/power7/libc.so.6(gsignal-0x16f708)[0xfff8f1cf8f0]
/opt/at7.0/lib64/power7/libc.so.6(abort-0x16dab4)[0xfff8f1d19c4]
sync/sync0rw.cc:690(000000ca.plt_call.MD5_Init)[0x10124318]
sync/sync0rw.cc:834(000000ca.plt_call.MD5_Init)[0x107d2b08]
include/sync0rw.ic:917(pfs_rw_lock_x_lock_func)[0x108329b4]
include/btr0sea.ic:81(000000ca.plt_call.MD5_Init)[0x1081c85c]
include/btr0pcur.ic:485(btr_pcur_open_with_no_init_func)[0x107b3c74]
handler/ha_innodb.cc:8374(000000ca.plt_call.MD5_Init)[0x106dfc4c]
sql/handler.h:2888(000000ca.plt_call.MD5_Init)[0x103e8f64]
sql/handler.cc:5520(000000ca.plt_call.MD5_Init)[0x103d7e6c]
sql/handler.cc:2609(000000ca.plt_call.MD5_Init)[0x103de780]
sql/sql_select.cc:18167(000000ca.plt_call.MD5_Init)[0x1023b53c]
sql/table.h:1366(disable_keyread)[0x10115844]
sql/sql_select.cc:3785(000000ca.plt_call.MD5_Init)[0x1011980c]
sql/sql_select.cc:1338(optimize_inner)[0x1026b5fc]
sql/sql_select.cc:3289(mysql_select)[0x10270180]
...
Query (0xff68001a850): SELECT c FROM sbtest18 WHERE id=4968
Connection ID (thread ID): 1287

This is MariaDB-10.0, bzr revision 4308, compiled with ATC 7.0. Unlike previous (working) binaries, this one is using libaio.



 Comments   
Comment by Sergey Vojtovich [ 2014-08-07 ]

Below are my comments on InnoDB memory barriers framework. I will post additional comment on correctness of barriers when I complete review.

  • No action: non-atomic loads/stores of shared variables is evil. But nobody seem to care about it since all loads are 32-bit, which are known to be atomic.
  • No action: my_atomic.h doesn't need cmake probes - all checks are done via ifdef-s. To my taste it is more compact. But since InnoDB accepted memory barriers patch with cmake probes we probably shouldn't bother about it either.
  • No action: we wondered about reasons for reducing number of spins. There is a comment added along with rev.6004 to MySQL-5.6.20: internal counter for innodb_sync_spin_loops is adjusted because memory barrier is more expensive than an empty loop.
  • Action: we miss definition of HAVE_WINDOWS_MM_FENCE in CMakeLists.txt. See how it is handled in rev.6004 of MySQL-5.6.20.
  • # define os_rmb do { } while(0) and # define os_wmb do { } while(0)
    do {} while(0) is excessive, just #define os_rmb should be fine.
Comment by Sergey Vojtovich [ 2014-08-08 ]

On memory barriers in mutexes:

- mutex_get_waiters() miss acquire memory barrier. This may cause
  mutex_exit_func() read stale 'waiters' value and be the reason
  for deadlock.
 
  There seem to be a workaround for that: srv_error_monitor_thread()
  is supposed to wake these stale threads every second. But if that's
  the case, we don't really need release memory barrier in
  mutex_set_waiters().
 
- ib_mutex_test_and_set(): release memory barrier must not be needed,
  we hold mutex anyway and don't care at which point lock_word will
  become visible to other threads.
 
- mutex_get_lock_word(): acquire memory barrier should not be needed.

Comment by Sergey Vojtovich [ 2014-08-08 ]

Neither of acquire memory barriers in sync_arr_cell_can_wake_up() should be needed.

Comment by Sergey Vojtovich [ 2014-09-05 ]

revno: 3413.65.7
revision-id: monty@mariadb.org-20140819162835-sorv0ogd39f7mui8
parent: knielsen@knielsen-hq.org-20140813134639-wk760plnzg5wu4x8
committer: Michael Widenius <monty@mariadb.org>
branch nick: maria-5.5
timestamp: Tue 2014-08-19 19:28:35 +0300
message:
MDEV-6450 - MariaDB crash on Power8 when built with advance tool chain
 
Part of this work is based on Stewart Smitch's memory barrier and lower priori
patches for power8.
 
- Added memory syncronization for innodb & xtradb for power8.
- Added HAVE_WINDOWS_MM_FENCE to CMakeList.txt
- Added os_isync to fix a syncronization problem on power
- Added log_get_lsn_nowait which is now used srv_error_monitor_thread to ensur
  if log mutex is locked.
 
All changes done both for InnoDB and Xtradb

Generated at Thu Feb 08 07:12:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.