[MDEV-6483] Deadlock around rw_lock_debug_mutex on PPC64 Created: 2014-07-24  Updated: 2014-08-29  Resolved: 2014-07-31

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.12
Fix Version/s: 10.0.13

Type: Bug Priority: Critical
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-6478 MariaDB on Power8 Closed

 Description   

We quite often see stale mysqld process on Power8. At least the following deadlock was identified during debugging:

#0  0x00003fff86742190 in __pthread_cond_wait (cond=0x3fff857f0640, mutex=0x3fff857f0600) at pthread_cond_wait.c:187
#1  0x0000000010a18dd8 in os_cond_wait (cond=0x3fff857f0640, fast_mutex=0x3fff857f0600)
    at /home/u0013565/svoj/10.0/storage/xtradb/os/os0sync.cc:214
#2  0x0000000010a1961c in os_event_wait_low (event=0x3fff857f0600, reset_sig_count=23)
    at /home/u0013565/svoj/10.0/storage/xtradb/os/os0sync.cc:610
#3  0x0000000010b22c2c in rw_lock_debug_mutex_enter () at /home/u0013565/svoj/10.0/storage/xtradb/sync/sync0rw.cc:936
#4  0x0000000010b22ebc in rw_lock_remove_debug_info (lock=0x3fff84b423d8, pass=0, lock_type=352)
    at /home/u0013565/svoj/10.0/storage/xtradb/sync/sync0rw.cc:1012
#5  0x0000000010bd89d0 in rw_lock_s_unlock_func (pass=0, lock=0x3fff84b423d8) at /home/u0013565/svoj/10.0/storage/xtradb/include/sync0rw.ic:614
#6  0x0000000010bd9410 in pfs_rw_lock_s_unlock_func (pass=0, lock=0x3fff84b423d8)
    at /home/u0013565/svoj/10.0/storage/xtradb/include/sync0rw.ic:1251
#7  0x0000000010bde57c in buf_flush_try_neighbors (space=0, offset=104, flush_type=BUF_FLUSH_LIST, n_flushed=75, n_to_flush=100)
    at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0flu.cc:1291
#8  0x0000000010bde9f4 in buf_flush_page_and_try_neighbors (bpage=0x3fff73ff0e08, flush_type=BUF_FLUSH_LIST, n_to_flush=100, count=0x3fffd40fab18)
    at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0flu.cc:1392
#9  0x0000000010bdf64c in buf_do_flush_list_batch (buf_pool=0x3fff84be0878, min_n=100, lsn_limit=18446744073709551615)
    at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0flu.cc:1719
#10 0x0000000010bdfaa8 in buf_flush_batch (buf_pool=0x3fff84be0878, flush_type=BUF_FLUSH_LIST, min_n=100, lsn_limit=18446744073709551615, 
    limited_lru_scan=false, n=0x3fffd40fac58) at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0flu.cc:1796
#11 0x0000000010be02bc in buf_flush_list (min_n=18446744073709551614, lsn_limit=18446744073709551615, n_processed=0x3fffd40faf48)
    at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0flu.cc:2051
#12 0x00000000109e77bc in log_preflush_pool_modified_pages (new_oldest=18446744073709551615)
    at /home/u0013565/svoj/10.0/storage/xtradb/log/log0log.cc:1871
#13 0x00000000109e8d04 in log_make_checkpoint_at (lsn=18446744073709551615, write_always=1)
    at /home/u0013565/svoj/10.0/storage/xtradb/log/log0log.cc:2333
#14 0x0000000010bd2c08 in buf_dblwr_create () at /home/u0013565/svoj/10.0/storage/xtradb/buf/buf0dblwr.cc:339
#15 0x0000000010b1c40c in innobase_start_or_create_for_mysql () at /home/u0013565/svoj/10.0/storage/xtradb/srv/srv0start.cc:2667
#16 0x000000001095f320 in innobase_init (p=0x3fff853e0e70) at /home/u0013565/svoj/10.0/storage/xtradb/handler/ha_innodb.cc:3760
#17 0x000000001050d800 in ha_initialize_handlerton (plugin=0x3fff856c4150) at /home/u0013565/svoj/10.0/sql/handler.cc:502
#18 0x000000001029e730 in plugin_initialize (tmp_root=0x3fffd40ff198, plugin=0x3fff856c4150, argc=0x114a35d0 <remaining_argc>, 
    argv=0x3fff85650940, options_only=false) at /home/u0013565/svoj/10.0/sql/sql_plugin.cc:1372
#19 0x000000001029f30c in plugin_init (argc=0x114a35d0 <remaining_argc>, argv=0x3fff85650940, flags=2)
    at /home/u0013565/svoj/10.0/sql/sql_plugin.cc:1608
#20 0x000000001018e3c4 in init_server_components () at /home/u0013565/svoj/10.0/sql/mysqld.cc:4790
#21 0x000000001018fb48 in mysqld_main (argc=48, argv=0x3fff85650940) at /home/u0013565/svoj/10.0/sql/mysqld.cc:5385
#22 0x00000000101824c8 in main (argc=48, argv=0x3fffd40ffa88) at /home/u0013565/svoj/10.0/sql/main.cc:25

From the above trace we can see that it happens during server initialization. The server won't continue initialization and waits for signal forever.

The problem is race conditions in rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit.



 Comments   
Comment by Sergey Vojtovich [ 2014-07-24 ]

Sergei, please review fix for this bug.

Comment by Sergei Golubchik [ 2014-07-30 ]

ok to push

Comment by Sergey Vojtovich [ 2014-07-31 ]

Fixed in 10.0.13:

revno: 4304
revision-id: svoj@mariadb.org-20140724141232-l1ynr5pjyjqz1c63
parent: svoj@mariadb.org-20140723085526-i0v39faneybs7m4p
committer: Sergey Vojtovich <svoj@mariadb.org>
branch nick: 10.0
timestamp: Thu 2014-07-24 18:12:32 +0400
message:
  MDEV-6483 - Deadlock around rw_lock_debug_mutex on PPC64
  
  This problem affects only debug builds on PPC64.
  
  There are at least two race conditions around
  rw_lock_debug_mutex_enter and rw_lock_debug_mutex_exit:
  
  - rw_lock_debug_waiters was loaded/stored without setting
    appropriate locks/memory barriers.
  - there is a gap between calls to os_event_reset() and
    os_event_wait() and in such case we're supposed to pass
    return value of the former to the latter.
  
  Fixed by replacing self-cooked spinlocks with system mutexes.
  These days system mutexes offer much better performance. OTOH
  performance is not that critical for debug builds.

Generated at Thu Feb 08 07:12:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.