[MDEV-19614] SET GLOBAL innodb_ deadlock due to LOCK_global_system_variables Created: 2019-05-28  Updated: 2019-09-14  Resolved: 2019-05-28

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.2.25, 10.1.41, 10.3.16, 10.4.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, deadlock, hang, upstream


 Description   

The ‘update’ callback functions for several settable global InnoDB variables are acquiring InnoDB latches while holding LOCK_global_system_variables.

On the other hand, some InnoDB code is invoking THDVAR() while holding InnoDB latches. An example of this is thd_lock_wait_timeout() that is called by lock_rec_enqueue_waiting(). In some cases, the intern_sys_var_ptr() that is invoked by THDVAR() may acquire LOCK_global_system_variables, via invoking sync_dynamic_session_variables().

In lock_rec_enqueue_waiting(), we really must be holding some InnoDB latch while invoking THDVAR(). This implies that LOCK_global_system_variables must conceptually reside below any InnoDB latch in the latching order. That in turns implies that the various ‘update’ callback functions must release LOCK_global_system_variables before acquiring any InnoDB mutexes or rw-locks, and reacquire LOCK_global_system_variables later. The ‘validate’ functions are not being invoked while holding LOCK_global_system_variables and thus they do not need any changes.

The following statements are affected by this:

SET GLOBAL innodb_adaptive_hash_index = …;
SET GLOBAL innodb_cmp_per_index_enabled = 1;
SET GLOBAL innodb_old_blocks_pct = …;
SET GLOBAL innodb_fil_make_page_dirty_debug = …; -- debug builds only
SET GLOBAL innodb_buffer_pool_evict = uncompressed; -- debug builds only
SET GLOBAL innodb_purge_run_now = 1; -- debug builds only
SET GLOBAL innodb_purge_stop_now = 1; -- debug builds only
SET GLOBAL innodb_log_checkpoint_now = 1; -- debug builds only
SET GLOBAL innodb_buf_flush_list_now = 1; -- debug builds only
SET GLOBAL innodb_buffer_pool_dump_now = 1;
SET GLOBAL innodb_buffer_pool_load_now = 1;
SET GLOBAL innodb_buffer_pool_load_abort = 1;
SET GLOBAL innodb_status_output = …;
SET GLOBAL innodb_status_output_locks = …;
SET GLOBAL innodb_encryption_threads = …;
SET GLOBAL innodb_encryption_rotate_key_age = …;
SET GLOBAL innodb_encryption_rotation_iops = …;
SET GLOBAL innodb_encrypt_tables = …;
SET GLOBAL innodb_disallow_writes = …;



 Comments   
Comment by Elena Stepanova [ 2019-05-31 ]

For the record and reference, below are stack traces that I was getting due to this problem (taken post-mortem, as the deadlock is so good that even gdb usually cannot connect to the running process), and the grammar/data I used:

10.3 617d34ae

Thread 35 (Thread 0x7f9169704700 (LWP 31209)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f9182a6dce6 in __GI___pthread_mutex_lock (mutex=0x55626f2a7b68 <LOCK_global_system_variables+40>) at ../nptl/pthread_mutex_lock.c:135
#2  0x000055626e6166e7 in safe_mutex_lock (mp=0x55626f2a7b40 <LOCK_global_system_variables>, my_flags=0, file=0x55626e6cf868 "/data/src/10.3/sql/sql_class.cc", line=1173) at /data/src/10.3/mysys/thr_mutex.c:293
#3  0x000055626db059ab in inline_mysql_mutex_lock (that=0x55626f2a7b40 <LOCK_global_system_variables>, src_file=0x55626e6cf868 "/data/src/10.3/sql/sql_class.cc", src_line=1173) at /data/src/10.3/include/mysql/psi/mysql_thread.h:710
#4  0x000055626db096e7 in THD::init (this=0x7f90e0000b00) at /data/src/10.3/sql/sql_class.cc:1173
#5  0x000055626db0a061 in THD::change_user (this=0x7f90e0000b00) at /data/src/10.3/sql/sql_class.cc:1373
#6  0x000055626db0a8fe in THD::reset_for_reuse (this=0x7f90e0000b00) at /data/src/10.3/sql/sql_class.cc:1571
#7  0x000055626dcd3900 in CONNECT::create_thd (this=0x5562711a1810, thd=0x7f90e0000b00) at /data/src/10.3/sql/sql_connect.cc:1498
#8  0x000055626da4e3d6 in cache_thread (thd=0x7f90e0000b00) at /data/src/10.3/sql/mysqld.cc:3027
#9  0x000055626da4e5a2 in one_thread_per_connection_end (thd=0x7f90e0000b00, put_in_cache=true) at /data/src/10.3/sql/mysqld.cc:3090
#10 0x000055626dcd3645 in do_handle_one_connection (connect=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1421
#11 0x000055626dcd32e2 in handle_one_connection (arg=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1308
#12 0x00007f9182a6b4a4 in start_thread (arg=0x7f9169704700) at pthread_create.c:456
#13 0x00007f9180fb3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 28 (Thread 0x7f916974f700 (LWP 31208)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f9182a6dce6 in __GI___pthread_mutex_lock (mutex=0x55626f2a7b68 <LOCK_global_system_variables+40>) at ../nptl/pthread_mutex_lock.c:135
#2  0x000055626e6166e7 in safe_mutex_lock (mp=0x55626f2a7b40 <LOCK_global_system_variables>, my_flags=0, file=0x55626e6b67d0 "/data/src/10.3/sql/sys_vars_shared.h", line=56) at /data/src/10.3/mysys/thr_mutex.c:293
#3  0x000055626da74e84 in inline_mysql_mutex_lock (that=0x55626f2a7b40 <LOCK_global_system_variables>, src_file=0x55626e6b67d0 "/data/src/10.3/sql/sys_vars_shared.h", src_line=56) at /data/src/10.3/include/mysql/psi/mysql_thread.h:710
#4  0x000055626da7a002 in PolyLock_mutex::wrlock (this=0x55626f2abd10 <PLock_global_system_variables>) at /data/src/10.3/sql/sys_vars_shared.h:56
#5  0x000055626da7a076 in AutoWLock::AutoWLock (this=0x7f916974dc30, l=0x55626f2abd10 <PLock_global_system_variables>) at /data/src/10.3/sql/sys_vars_shared.h:74
#6  0x000055626da756f2 in sys_var::update (this=0x5562712554e8, thd=0x7f90ec000b00, var=0x7f90ec011588) at /data/src/10.3/sql/set_var.cc:205
#7  0x000055626da77488 in set_var::update (this=0x7f90ec011588, thd=0x7f90ec000b00) at /data/src/10.3/sql/set_var.cc:837
#8  0x000055626da77113 in sql_set_variables (thd=0x7f90ec000b00, var_list=0x7f90ec0057b8, free=true) at /data/src/10.3/sql/set_var.cc:740
#9  0x000055626db7628f in mysql_execute_command (thd=0x7f90ec000b00) at /data/src/10.3/sql/sql_parse.cc:5147
#10 0x000055626db7f8d8 in mysql_parse (thd=0x7f90ec000b00, rawbuf=0x7f90ec0113d8 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 3782 CON_ID 14 */", length=70, parser_state=0x7f916974e640, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8098
#11 0x000055626db6c91a in dispatch_command (command=COM_QUERY, thd=0x7f90ec000b00, packet=0x7f90ec008b61 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 3782 CON_ID 14 */ ", packet_length=71, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1858
#12 0x000055626db6b33d in do_command (thd=0x7f90ec000b00) at /data/src/10.3/sql/sql_parse.cc:1403
#13 0x000055626dcd356b in do_handle_one_connection (connect=0x55627124bbf0) at /data/src/10.3/sql/sql_connect.cc:1402
#14 0x000055626dcd32e2 in handle_one_connection (arg=0x55627124bbf0) at /data/src/10.3/sql/sql_connect.cc:1308
#15 0x00007f9182a6b4a4 in start_thread (arg=0x7f916974f700) at pthread_create.c:456
#16 0x00007f9180fb3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 27 (Thread 0x7f916979a700 (LWP 31206)):
#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f9182a6dce6 in __GI___pthread_mutex_lock (mutex=0x55626f2a7b68 <LOCK_global_system_variables+40>) at ../nptl/pthread_mutex_lock.c:135
#2  0x000055626e6166e7 in safe_mutex_lock (mp=0x55626f2a7b40 <LOCK_global_system_variables>, my_flags=0, file=0x55626e6b67d0 "/data/src/10.3/sql/sys_vars_shared.h", line=56) at /data/src/10.3/mysys/thr_mutex.c:293
#3  0x000055626da74e84 in inline_mysql_mutex_lock (that=0x55626f2a7b40 <LOCK_global_system_variables>, src_file=0x55626e6b67d0 "/data/src/10.3/sql/sys_vars_shared.h", src_line=56) at /data/src/10.3/include/mysql/psi/mysql_thread.h:710
#4  0x000055626da7a002 in PolyLock_mutex::wrlock (this=0x55626f2abd10 <PLock_global_system_variables>) at /data/src/10.3/sql/sys_vars_shared.h:56
#5  0x000055626da7a076 in AutoWLock::AutoWLock (this=0x7f9169798c30, l=0x55626f2abd10 <PLock_global_system_variables>) at /data/src/10.3/sql/sys_vars_shared.h:74
#6  0x000055626da756f2 in sys_var::update (this=0x5562712554e8, thd=0x7f90e8000b00, var=0x7f90e8011588) at /data/src/10.3/sql/set_var.cc:205
#7  0x000055626da77488 in set_var::update (this=0x7f90e8011588, thd=0x7f90e8000b00) at /data/src/10.3/sql/set_var.cc:837
#8  0x000055626da77113 in sql_set_variables (thd=0x7f90e8000b00, var_list=0x7f90e80057b8, free=true) at /data/src/10.3/sql/set_var.cc:740
#9  0x000055626db7628f in mysql_execute_command (thd=0x7f90e8000b00) at /data/src/10.3/sql/sql_parse.cc:5147
#10 0x000055626db7f8d8 in mysql_parse (thd=0x7f90e8000b00, rawbuf=0x7f90e80113d8 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 3915 CON_ID 13 */", length=70, parser_state=0x7f9169799640, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8098
#11 0x000055626db6c91a in dispatch_command (command=COM_QUERY, thd=0x7f90e8000b00, packet=0x7f90e8008b61 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 3915 CON_ID 13 */ ", packet_length=71, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1858
#12 0x000055626db6b33d in do_command (thd=0x7f90e8000b00) at /data/src/10.3/sql/sql_parse.cc:1403
#13 0x000055626dcd356b in do_handle_one_connection (connect=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1402
#14 0x000055626dcd32e2 in handle_one_connection (arg=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1308
#15 0x00007f9182a6b4a4 in start_thread (arg=0x7f916979a700) at pthread_create.c:456
#16 0x00007f9180fb3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 1 (Thread 0x7f9180062700 (LWP 31197)):
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
#1  0x000055626e6118d1 in my_write_core (sig=6) at /data/src/10.3/mysys/stacktrace.c:481
#2  0x000055626de69c97 in handle_fatal_signal (sig=6) at /data/src/10.3/sql/signal_handler.cc:343
#3  <signal handler called>
#4  0x00007f9182a7470d in fsync () at ../sysdeps/unix/syscall-template.S:84
#5  0x000055626e144514 in os_file_fsync_posix (file=6) at /data/src/10.3/storage/innobase/os/os0file.cc:2479
#6  0x000055626e144870 in os_file_flush_func (file=6) at /data/src/10.3/storage/innobase/os/os0file.cc:2572
#7  0x000055626e39c290 in pfs_os_file_flush_func (file=..., src_file=0x55626e9ab880 "/data/src/10.3/storage/innobase/fil/fil0fil.cc", src_line=851) at /data/src/10.3/storage/innobase/include/os0file.ic:406
#8  0x000055626e39f291 in fil_flush_low (space=0x556271c6c900, metadata=false) at /data/src/10.3/storage/innobase/fil/fil0fil.cc:851
#9  0x000055626e3ab28f in fil_flush (space_id=0) at /data/src/10.3/storage/innobase/fil/fil0fil.cc:4526
#10 0x000055626e3ab526 in fil_flush_file_spaces (purpose=FIL_TYPE_TABLESPACE) at /data/src/10.3/storage/innobase/fil/fil0fil.cc:4599
#11 0x000055626e3a27b6 in fil_write_flushed_lsn (lsn=184633397) at /data/src/10.3/storage/innobase/fil/fil0fil.cc:1927
#12 0x000055626e0948f6 in checkpoint_now_set (save=0x7f910c013768) at /data/src/10.3/storage/innobase/handler/ha_innodb.cc:18286
#13 0x000055626db903ca in sys_var_pluginvar::global_update (this=0x5562712554e8, thd=0x7f910c000b00, var=0x7f910c013748) at /data/src/10.3/sql/sql_plugin.cc:3530
#14 0x000055626da75739 in sys_var::update (this=0x5562712554e8, thd=0x7f910c000b00, var=0x7f910c013748) at /data/src/10.3/sql/set_var.cc:208
#15 0x000055626da77488 in set_var::update (this=0x7f910c013748, thd=0x7f910c000b00) at /data/src/10.3/sql/set_var.cc:837
#16 0x000055626da77113 in sql_set_variables (thd=0x7f910c000b00, var_list=0x7f910c0057b8, free=true) at /data/src/10.3/sql/set_var.cc:740
#17 0x000055626db7628f in mysql_execute_command (thd=0x7f910c000b00) at /data/src/10.3/sql/sql_parse.cc:5147
#18 0x000055626db7f8d8 in mysql_parse (thd=0x7f910c000b00, rawbuf=0x7f910c013598 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 4288 CON_ID 12 */", length=70, parser_state=0x7f9180061640, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:8098
#19 0x000055626db6c91a in dispatch_command (command=COM_QUERY, thd=0x7f910c000b00, packet=0x7f910c0837f1 "SET @@GLOBAL.innodb_log_checkpoint_now = TRUE /* QNO 4288 CON_ID 12 */ ", packet_length=71, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1858
#20 0x000055626db6b33d in do_command (thd=0x7f910c000b00) at /data/src/10.3/sql/sql_parse.cc:1403
#21 0x000055626dcd356b in do_handle_one_connection (connect=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1402
#22 0x000055626dcd32e2 in handle_one_connection (arg=0x5562711a1810) at /data/src/10.3/sql/sql_connect.cc:1308
#23 0x00007f9182a6b4a4 in start_thread (arg=0x7f9180062700) at pthread_create.c:456
#24 0x00007f9180fb3d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Grammar 77.yy

query:
	INSERT INTO _table ( _field ) VALUES ( _digit ) ON DUPLICATE KEY UPDATE _field_no_pk = _digit |
	SET @@GLOBAL.innodb_log_checkpoint_now = TRUE;

Data 77.zz

$tables = {
  pk => ['int auto_increment'],
  partitions => [ undef ],
  engines => ['InnoDB'],
  rows => [100,500]
};
 
$fields = {
  types => ['varchar','char','int','blob','text','date'],
  indexes => [undef,'unique','key'],
  null => [undef,'not null'],
  default => [undef,'default null'],
  sign => [undef, 'unsigned'],
  charsets => ['utf8','latin1']
};
 
$data = {
  numbers => ['digit','null',undef],
  temporals => ['datetime','timestamp'],
  strings => ['varchar(1024)','varchar(1024)','english','letter'],
  blobs => ['letter','data','english'],
  temporals => ['date','year','null',undef]
}

Command line, https://github.com/MariaDB/randgen branch elenst-dev 7646c3fe

perl ./runall-new.pl --basedir=/data/bld/10.3-debug-old --vardir=/data/logs/hang19 --duration=350 --threads=3 --reporters=Deadlock  --grammar=77.yy --gendata=77.zz

It didn't happen every time, but frequently enough. Increasing the number of threads would increase the probability.
Couldn't reproduce anymore after this bugfix.

Generated at Thu Feb 08 08:53:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.