[MDEV-32728] safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'wait_mutex' Created: 2023-11-07  Updated: 2023-11-09  Resolved: 2023-11-09

Status: Closed
Project: MariaDB Server
Component/s: Locking
Affects Version/s: 10.6
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: regression, rr-profile-analyzed

Issue Links:
Problem/Incident
is caused by MDEV-32530 Race condition in lock_wait_rpl_report() Closed

 Description   

bb-10.6-release 02e97252

safe_mutex: Found wrong usage of mutex 'LOCK_thd_data' and 'wait_mutex'
Mutex currently locked (in reverse order):
wait_mutex                        /data/src/10.6/storage/innobase/handler/ha_innodb.cc  line 5024
LOCK_thd_data                     /data/src/10.6/sql/sql_class.h  line 3851
LOCK_thd_kill                     /data/src/10.6/sql/sql_class.h  line 3850

The failure started happening after this commit in 10.6

commit e039720bf3494a35b34eb0ddc55af170a1807723
Author: Marko Mäkelä
Date:   Mon Sep 11 14:51:02 2023 +0300
 
    MDEV-32096 Parallel replication lags because innobase_kill_query() may fail to interrupt a lock wait

That's expected, as the mutex lock was only added there.
Since the same test doesn't trigger any other failures for me on a build before the commit, I'll consider it a regression for now. Feel free to demote if the analysis shows otherwise.



 Comments   
Comment by Marko Mäkelä [ 2023-11-08 ]

The print_deadlock_warning() is being invoked for the following call to safe_mutex_lock() on lock_sys.wait_mutex. At this point, thd->LOCK_thd_data is indeed being held:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

#0  print_deadlock_warning (new_mutex=0x60700001e028, parent_mutex=0x607000187060) at /data/MDEV-32728/src2/mysys/thr_mutex.c:808
#1  0x000055d9bb3de871 in safe_mutex_lock (mp=0x55d9be2e8580 <lock_sys+192>, my_flags=0, file=0x55d9bc170820 "/data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc", line=5024)
    at /data/MDEV-32728/src2/mysys/thr_mutex.c:357
#2  0x000055d9ba96771e in inline_mysql_mutex_lock (that=0x55d9be2e8580 <lock_sys+192>, src_file=0x55d9bc170820 "/data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc", src_line=5024)
    at /data/MDEV-32728/src2/include/mysql/psi/mysql_thread.h:750
#3  0x000055d9ba9828bf in innobase_kill_query (thd=0x62b0000d9218) at /data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc:5024
#4  0x000055d9b9f1730d in kill_handlerton (thd=0x62b0000d9218, plugin=0x6030000019d8, level=0x68b2740828d0) at /data/MDEV-32728/src2/sql/handler.cc:936
#5  0x000055d9b96bbd86 in plugin_foreach_with_mask (thd=0x62b0000d9218, func=0x55d9b9f17085 <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x68b2740828d0)
    at /data/MDEV-32728/src2/sql/sql_plugin.cc:2558
#6  0x000055d9b9f17415 in ha_kill_query (thd=0x62b0000d9218, level=THD_ABORT_SOFTLY) at /data/MDEV-32728/src2/sql/handler.cc:943
#7  0x000055d9b953cca1 in THD::awake_no_mutex (this=0x62b0000d9218, state_to_set=KILL_TIMEOUT) at /data/MDEV-32728/src2/sql/sql_class.cc:1942
#8  0x000055d9b936b2ae in THD::awake (this=0x62b0000d9218, state_to_set=KILL_TIMEOUT) at /data/MDEV-32728/src2/sql/sql_class.h:3852
#9  0x000055d9b9530263 in thd_kill_timeout (thd=0x62b0000d9218) at /data/MDEV-32728/src2/sql/sql_class.cc:629
#10 0x000055d9bb3e4728 in process_timers (now=0x68b274082bc0) at /data/MDEV-32728/src2/mysys/thr_timer.c:271
#11 0x000055d9bb3e4ceb in timer_handler (arg=0x0) at /data/MDEV-32728/src2/mysys/thr_timer.c:315
#12 0x000024832c0b8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#13 0x000030d05f19e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I tried to find evidence of the wrong usage, by setting a data watchpoint:

break print_deadlock_warning
continue
frame 3
watch -l thd->LOCK_thd_data.m_mutex.mutex.__data.__owner
condition 2 lock_sys.wait_mutex.m_mutex.mutex.__data.__owner
run

On the rerun of the rr replay trace, none of the occurrences of the watchpoint satisfied the condition, that is, lock_sys.wait_mutex was not actually locked by any thread whenever the THD::LOCK_thd_data in question was acquired or released.

I then debugged the safe_mutex code to find out what makes it ‘think’ that there is a locking order inversion. This is where the hash table lookup succeeds:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

#0  my_strnncoll_binary (cs=0x55d9bd7ae2e0 <my_charset_bin>, s=0x60700001e050 <incomplete sequence \327>, slen=8, t=0x62b0000db070 <incomplete sequence \327>, tlen=8, t_is_prefix=0 '\000')
    at /data/MDEV-32728/src2/strings/ctype-bin.c:91
#1  0x000055d9bb35c499 in hashcmp (hash=0x612000083a58, pos=0x61d000a528b8, key=0x62b0000db070 <incomplete sequence \327>, length=0) at /data/MDEV-32728/src2/mysys/hash.c:374
#2  0x000055d9bb35bd0c in my_hash_first_from_hash_value (hash=0x612000083a58, hash_value=926365490, key=0x62b0000db070 <incomplete sequence \327>, length=0, current_record=0x68b274082470)
    at /data/MDEV-32728/src2/mysys/hash.c:289
#3  0x000055d9bb35bb27 in my_hash_first (hash=0x612000083a58, key=0x62b0000db070 <incomplete sequence \327>, length=0, current_record=0x68b274082470) at /data/MDEV-32728/src2/mysys/hash.c:263
#4  0x000055d9bb35b81a in my_hash_search (hash=0x612000083a58, key=0x62b0000db070 <incomplete sequence \327>, length=0) at /data/MDEV-32728/src2/mysys/hash.c:236
#5  0x000055d9bb3de855 in safe_mutex_lock (mp=0x55d9be2e8580 <lock_sys+192>, my_flags=0, file=0x55d9bc170820 "/data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc", line=5024)
    at /data/MDEV-32728/src2/mysys/thr_mutex.c:355
#6  0x000055d9ba96771e in inline_mysql_mutex_lock (that=0x55d9be2e8580 <lock_sys+192>, src_file=0x55d9bc170820 "/data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc", src_line=5024)
    at /data/MDEV-32728/src2/include/mysql/psi/mysql_thread.h:750
#7  0x000055d9ba9828bf in innobase_kill_query (thd=0x62b0000d9218) at /data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc:5024

I set one more watchpoint to find out when the problematic hash table entry was inserted:

disable 2
watch -l *(void**)0x60700001e050
run

The only write was very close to the start of the trace:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 2 hit Hardware watchpoint 3: -location *(void**)0x60700001e050
 
Old value = (void *) 0x0
New value = (void *) 0xd7
safe_mutex_lock (mp=0x62b0000daff8, my_flags=0, file=0x55d9bb6c7de0 "/data/MDEV-32728/src2/sql/sql_class.cc", line=839) at /data/MDEV-32728/src2/mysys/thr_mutex.c:346
346	          deadlock->mutex= mp;
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
2: x/i $pc
=> 0x55d9bb3de769 <safe_mutex_lock+2363>:	mov    -0x18(%rbp),%rax
(rr) bt
#0  safe_mutex_lock (mp=0x62b0000daff8, my_flags=0, file=0x55d9bb6c7de0 "/data/MDEV-32728/src2/sql/sql_class.cc", line=839) at /data/MDEV-32728/src2/mysys/thr_mutex.c:346
#1  0x000055d9b952b00b in inline_mysql_mutex_lock (that=0x62b0000daff8, src_file=0x55d9bb6c7de0 "/data/MDEV-32728/src2/sql/sql_class.cc", src_line=839)
    at /data/MDEV-32728/src2/include/mysql/psi/mysql_thread.h:750
#2  0x000055d9b953331a in THD::THD (this=0x62b0000d9218, id=7, is_wsrep_applier=false) at /data/MDEV-32728/src2/sql/sql_class.cc:839
#3  0x000055d9b9aea7cf in CONNECT::create_thd (this=0x608000002bb8, thd=0x0) at /data/MDEV-32728/src2/sql/sql_connect.cc:1533
#4  0x000055d9b9ae96e8 in do_handle_one_connection (connect=0x608000002bb8, put_in_cache=true) at /data/MDEV-32728/src2/sql/sql_connect.cc:1362
#5  0x000055d9b9ae9430 in handle_one_connection (arg=0x608000002bb8) at /data/MDEV-32728/src2/sql/sql_connect.cc:1318
#6  0x000055d9ba78cae6 in pfs_spawn_thread (arg=0x617000007798) at /data/MDEV-32728/src2/storage/perfschema/pfs.cc:2201
#7  0x000024832c0b8609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x000030d05f19e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 47186
(rr) frame 2
#2  0x000055d9b953331a in THD::THD (this=0x62b0000d9218, id=7, is_wsrep_applier=false) at /data/MDEV-32728/src2/sql/sql_class.cc:839
839	  mysql_mutex_record_order(&LOCK_thd_kill, &LOCK_thd_data);

At this point, lock_sys.wait_mutex cannot possibly held by this thread. Whenever lock_sys is being accessed, a THD object must already have been created.

This THD::THD() is executed by Thread 4 in the rr replay trace, while the print_deadlock_warning() is invoked in Thread 2. The first acquisition of lock_sys.wait_mutex is after the above initialization. In Thread 2, lock_sys.wait_mutex is not acquired or released before the call that leads to print_deadlock_warning(), at when 487353. The first acquisition of lock_sys.wait_mutex (in any thread) occurs at when 131416, during the execution of ALTER TABLE `thash_2_1` REMOVE PARTITIONING, in a different thread. The first acquisition in Thread 4 occurs at when 425859, during SHOW TABLE STATUS.

To me, this looks like a bogus report of lock order inversion. I think that this needs to be analyzed by someone who is familiar with the inner workings of safe_mutex() or the hash table implementation that it uses.

Comment by Marko Mäkelä [ 2023-11-08 ]

I thought that safe_mutex could lump together all THD::LOCK_thd_data of different THD objects. I did git grep -w LOCK_thd_data of the source code and placed breakpoints on all code paths where THD::LOCK_thd_data would be acquired.

Some breakpoints were hit, but they were false alarms:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 20 hit Breakpoint 41, THD::reset_kill_query (this=0x62b00019d218) at /data/MDEV-32728/src2/sql/sql_class.h:4615
4615	  inline void reset_kill_query()
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) thr f 2446591
Thread 2 has target id 'Thread 2446518.2446591'
(rr) thr a 2 bt
 
Thread 2 (Thread 2446518.2446591):
#0  __sanitizer::StackDepotNode::eq (hash=<optimized out>, args=<optimized out>, this=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_stackdepot.cc:42
#1  __sanitizer::StackDepotNode::eq (args=<error reading variable: access outside bounds of object referenced via synthetic pointer>, hash=2957846211, this=0x1d397528d9c8) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_stackdepot.cc:36
#2  __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::find (s=s@entry=0x1d397528d9c8, args={trace = 0x64000029ee60, size = 22, tag = 0}, hash=hash@entry=2957846211) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_stackdepotbase.h:64
#3  0x00004c752f581612 in __sanitizer::StackDepotBase<__sanitizer::StackDepotNode, 1, 20>::Put (this=this@entry=0x4c752f683440 <__sanitizer::theDepot>, args=<optimized out>, inserted=inserted@entry=0x0) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_atomic_clang_x86.h:26
#4  0x00004c752f58124c in __sanitizer::StackDepotPut (stack=<optimized out>) at ../../../../src/libsanitizer/sanitizer_common/sanitizer_stackdepot.cc:110
#5  0x00004c752f47f964 in __asan::Allocator::QuarantineChunk (this=0x4c752f5dde00 <__asan::instance>, m=0x6160017b6870, ptr=<optimized out>, stack=0x64000029ee50) at ../../../../src/libsanitizer/asan/asan_allocator.cc:572
#6  0x00004c752f5603d9 in __interceptor_free (ptr=0x6160017b6880) at ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:127
#7  0x000055d9bb444625 in FreeState (cs=0x606000233000, free_state=1) at /data/MDEV-32728/src2/dbug/dbug.c:1642
#8  0x000055d9bb43e520 in _db_pop_ () at /data/MDEV-32728/src2/dbug/dbug.c:935
#9  0x000055d9bb3debea in safe_mutex_lock (mp=0x55d9be2e8580 <lock_sys+192>, my_flags=0, file=0x55d9bc4da120 "/data/MDEV-32728/src2/storage/innobase/dict/drop.cc", line=289) at /data/MDEV-32728/src2/mysys/thr_mutex.c:398
#10 0x000055d9bb0b1f2b in inline_mysql_mutex_lock (that=0x55d9be2e8580 <lock_sys+192>, src_file=0x55d9bc4da120 "/data/MDEV-32728/src2/storage/innobase/dict/drop.cc", src_line=289) at /data/MDEV-32728/src2/include/mysql/psi/mysql_thread.h:750
#11 0x000055d9bb0b48d8 in trx_t::commit (this=0x5f5473c86b40, deleted=std::vector of length 1, capacity 1 = {{m_file = 1297, m_psi = 0x0}}) at /data/MDEV-32728/src2/storage/innobase/dict/drop.cc:289
#12 0x000055d9ba9b81f5 in ha_innobase::delete_table (this=0x62f000072438, name=0x6400002a0600 "./partition_db/#sql-backup-2554b6-4#P#p47") at /data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc:13780
#13 0x000055d9ba730473 in ha_partition::del_ren_table (this=0x62f00002b438, from=0x6400002a1be0 "./partition_db/#sql-backup-2554b6-4", to=0x0) at /data/MDEV-32728/src2/sql/ha_partition.cc:2509
#14 0x000055d9ba723d57 in ha_partition::delete_table (this=0x62f00002b438, name=0x6400002a1be0 "./partition_db/#sql-backup-2554b6-4") at /data/MDEV-32728/src2/sql/ha_partition.cc:633
#15 0x000055d9b9f149cc in hton_drop_table (hton=0x615000002618, path=0x6400002a1be0 "./partition_db/#sql-backup-2554b6-4") at /data/MDEV-32728/src2/sql/handler.cc:573
#16 0x000055d9b9f24236 in ha_delete_table (thd=0x62b0000af218, hton=0x615000002618, path=0x6400002a1be0 "./partition_db/#sql-backup-2554b6-4", db=0x6400002a3b90, alias=0x6400002a2120, generate_warning=false) at /data/MDEV-32728/src2/sql/handler.cc:3162
#17 0x000055d9b98fb024 in quick_rm_table (thd=0x62b0000af218, base=0x615000002618, db=0x6400002a3b90, table_name=0x6400002a2120, flags=3, table_path=0x0) at /data/MDEV-32728/src2/sql/sql_table.cc:1941
#18 0x000055d9b993ed77 in mysql_alter_table (thd=0x62b0000af218, new_db=0x62b0000b3ea0, new_name=0x62b0000b4300, create_info=0x6400002a5110, table_list=0x62b0000b63c8, recreate_info=0x6400002a4ec0, alter_info=0x6400002a4f60, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/MDEV-32728/src2/sql/sql_table.cc:10912
#19 0x000055d9b9b0906c in Sql_cmd_alter_table::execute (this=0x62b0000b6af0, thd=0x62b0000af218) at /data/MDEV-32728/src2/sql/sql_alter.cc:675
#20 0x000055d9b96864e0 in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=false) at /data/MDEV-32728/src2/sql/sql_parse.cc:6026
#21 0x000055d9b9693d1d in mysql_parse (thd=0x62b0000af218, rawbuf=0x62b0000b6238 "/* WRK-6 QNO 75 */  ALTER TABLE `thash_2_1` REMOVE PARTITIONING", length=63, parser_state=0x6400002a67f0) at /data/MDEV-32728/src2/sql/sql_parse.cc:8051

This was duplicated a few times (none of the callers are Thread 2):

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 20 hit Breakpoint 33, THD::reset_killed (this=0xd88aa0e80a2) at /data/MDEV-32728/src2/sql/sql_class.cc:2153
2153	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) c
Continuing.
 
Thread 20 hit Breakpoint 41, THD::reset_kill_query (this=0x62b00019daa0) at /data/MDEV-32728/src2/sql/sql_class.h:4615
4615	  inline void reset_kill_query()
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) 
Continuing.
 
Thread 20 hit Breakpoint 33, THD::reset_killed (this=0x62b00019d218) at /data/MDEV-32728/src2/sql/sql_class.cc:2153
2153	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) 
Continuing.
[Switching to Thread 2446518.2446661]
 
Thread 19 hit Breakpoint 41, THD::reset_kill_query (this=0x62b00018f218) at /data/MDEV-32728/src2/sql/sql_class.h:4615
4615	  inline void reset_kill_query()
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) 
Continuing.
 
Thread 19 hit Breakpoint 33, THD::reset_killed (this=0x75a2f7e4ea2) at /data/MDEV-32728/src2/sql/sql_class.cc:2153
2153	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) 
Continuing.
 
Thread 19 hit Breakpoint 41, THD::reset_kill_query (this=0x62b00018faa0) at /data/MDEV-32728/src2/sql/sql_class.h:4615
4615	  inline void reset_kill_query()
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) 
Continuing.
 
Thread 19 hit Breakpoint 33, THD::reset_killed (this=0x62b00018f218) at /data/MDEV-32728/src2/sql/sql_class.cc:2153
2153	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446591, __nusers = 2, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}

Another breakpoint was hit, but it was a false alarm as well:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 8 hit Breakpoint 27, close_thread_table (thd=0x55d9bb6ce8e0, table_ptr=0x41b58ab3) at /data/MDEV-32728/src2/sql/sql_base.cc:965
965	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 2, __count = 0, __owner = 2446661, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) bt
#0  close_thread_table (thd=0x55d9bb6ce8e0, table_ptr=0x41b58ab3) at /data/MDEV-32728/src2/sql/sql_base.cc:965
#1  0x000055d9b94c48e8 in close_thread_tables (thd=0x62b0000bd218) at /data/MDEV-32728/src2/sql/sql_base.cc:956
#2  0x000055d9b9686d28 in mysql_execute_command (thd=0x62b0000bd218, is_called_from_prepared_stmt=false) at /data/MDEV-32728/src2/sql/sql_parse.cc:6092
#3  0x000055d9b9693d1d in mysql_parse (thd=0x62b0000bd218, 
    rawbuf=0x62b0000c4238 "/* WRK-4 QNO 170 */  SELECT `col_int_nokey` MOD 10 AS `col_int_nokey`, `col_int_key` MOD 10 AS `col_int_key` FROM `thash_2_6` PARTITION (p22, p17)", length=146, 
    parser_state=0x7c25163527f0) at /data/MDEV-32728/src2/sql/sql_parse.cc:8051
(rr) thread find 2446661
Thread 19 has target id 'Thread 2446518.2446661'

Thread 19 is waiting for a lock for an InnoDB persistent statistics table in CREATE TABLE IF NOT EXISTS `thash_5_5`. There were two more "false alarms" due to the same lock_sys.wait_mutex holder:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 8 hit Breakpoint 41, THD::reset_kill_query (this=0x62b0000bdaa0) at /data/MDEV-32728/src2/sql/sql_class.h:4615
4615	  inline void reset_kill_query()
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 2, __count = 0, __owner = 2446661, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
Thread 8 hit Breakpoint 33, THD::reset_killed (this=0x62b0000bd218) at /data/MDEV-32728/src2/sql/sql_class.cc:2153
2153	{
1: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 2, __count = 0, __owner = 2446661, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}

Below is the entire list of my breakpoints. Those on which I set a condition were hit during the execution:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

1       breakpoint     keep y   0x000055d9bb3e1839 in print_deadlock_warning at /data/MDEV-32728/src2/mysys/thr_mutex.c:808
	breakpoint already hit 1 time
9       breakpoint     keep y   <MULTIPLE>         
9.1                         y   0x000055d9ba8022b2 in PFS_status_variable_cache::do_materialize_session(THD*) at /data/MDEV-32728/src2/storage/perfschema/pfs_variable.cc:1007
9.2                         y   0x000055d9ba802572 in PFS_status_variable_cache::do_materialize_session(PFS_thread*) at /data/MDEV-32728/src2/storage/perfschema/pfs_variable.cc:1054
9.3                         y   0x000055d9ba8042e2 in PFS_status_variable_cache::do_materialize_session(PFS_thread*, unsigned int) at /data/MDEV-32728/src2/storage/perfschema/pfs_variable.h:670
10      breakpoint     keep y   0x000055d9ba801ff2 in PFS_status_variable_cache::do_materialize_all(THD*) at /data/MDEV-32728/src2/storage/perfschema/pfs_variable.cc:961
11      breakpoint     keep y   0x000055d9b9f0efa2 in wsrep_must_abort(THD*) at /data/MDEV-32728/src2/sql/wsrep_trans_observer.h:57
12      breakpoint     keep y   0x000055d9b9f531bb in ha_abort_transaction(THD*, THD*, char) at /data/MDEV-32728/src2/sql/handler.cc:8009
13      breakpoint     keep y   0x000055d9b934bc36 in warn_threads_active_after_phase_2(THD*, void*) at /data/MDEV-32728/src2/sql/mysqld.cc:1585
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
14      breakpoint     keep y   0x000055d9ba2471de in TC_LOG::run_commit_ordered(THD*, bool) at /data/MDEV-32728/src2/sql/log.cc:9365
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
15      breakpoint     keep y   <MULTIPLE>         
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
15.1                        y   0x000055d9b93fdde2 in wsrep_must_replay(THD*) at /data/MDEV-32728/src2/sql/wsrep_trans_observer.h:66
15.2                        y   0x000055d9b952d75b in wsrep_must_replay(THD*) at /data/MDEV-32728/src2/sql/wsrep_trans_observer.h:66
15.3                        y   0x000055d9ba2b3a87 in wsrep_must_replay(THD*) at /data/MDEV-32728/src2/sql/log_event_server.cc:4006
16      breakpoint     keep y   0x000055d9b9398234 in THD::unregister_slave() at /data/MDEV-32728/src2/sql/repl_failsafe.cc:98
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
17      breakpoint     keep y   0x000055d9b93989ba in THD::is_binlog_dump_thread() at /data/MDEV-32728/src2/sql/repl_failsafe.cc:169
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
18      breakpoint     keep y   0x000055d9b9398a47 in show_slave_hosts_callback(THD*, Protocol*) at /data/MDEV-32728/src2/sql/repl_failsafe.cc:179
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
19      breakpoint     keep y   0x000055d9ba9236af in wsrep_thd_LOCK(THD const*) at /data/MDEV-32728/src2/sql/service_wsrep.cc:31
20      breakpoint     keep y   0x000055d9ba9236e0 in wsrep_thd_TRYLOCK(THD const*) at /data/MDEV-32728/src2/sql/service_wsrep.cc:36
21      breakpoint     keep y   0x000055d9ba923fd4 in wsrep_thd_is_BF(THD const*, my_bool) at /data/MDEV-32728/src2/sql/service_wsrep.cc:173
22      breakpoint     keep y   0x000055d9ba924152 in wsrep_handle_SR_rollback(THD*, THD*) at /data/MDEV-32728/src2/sql/service_wsrep.cc:191
23      breakpoint     keep y   0x000055d9b93c512e in terminate_slave_thread(THD*, mysql_mutex_t*, mysql_cond_t*, uint volatile*, bool) at /data/MDEV-32728/src2/sql/slave.cc:1038
24      breakpoint     keep y   0x000055d9b93d782c in apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int) at /data/MDEV-32728/src2/sql/slave.cc:3880
25      breakpoint     keep y   0x000055d9b93d8953 in exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*) at /data/MDEV-32728/src2/sql/slave.cc:4200
26      breakpoint     keep y   0x000055d9b93dfbaf in handle_slave_sql(void*) at /data/MDEV-32728/src2/sql/slave.cc:5295
27      breakpoint     keep y   0x000055d9b94c49d8 in close_thread_table(THD*, TABLE**) at /data/MDEV-32728/src2/sql/sql_base.cc:965
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 1 time
28      breakpoint     keep y   0x000055d9b952fba6 in thd_set_ha_data(THD*, handlerton const*, void const*) at /data/MDEV-32728/src2/sql/sql_class.cc:466
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
29      breakpoint     keep y   0x000055d9b9538de8 in THD::set_db(st_mysql_const_lex_string const*) at /data/MDEV-32728/src2/sql/sql_class.cc:1475
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
30      breakpoint     keep y   0x000055d9b9539254 in THD::reset_db(st_mysql_const_lex_string const*) at /data/MDEV-32728/src2/sql/sql_class.cc:1525
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
31      breakpoint     keep y   0x000055d9b953d2d4 in THD::disconnect() at /data/MDEV-32728/src2/sql/sql_class.cc:2024
32      breakpoint     keep y   0x000055d9b953d43c in THD::notify_shared_lock(MDL_context_owner*, bool) at /data/MDEV-32728/src2/sql/sql_class.cc:2052
33      breakpoint     keep y   0x000055d9b953e1c0 in THD::reset_killed() at /data/MDEV-32728/src2/sql/sql_class.cc:2153
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 5 times
34      breakpoint     keep y   0x000055d9b95559c0 in thd_progress_report(THD*, ulonglong, ulonglong) at /data/MDEV-32728/src2/sql/sql_class.cc:4802
35      breakpoint     keep y   0x000055d9b9555bcb in thd_progress_next_stage(THD*) at /data/MDEV-32728/src2/sql/sql_class.cc:4833
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 1 time
36      breakpoint     keep y   0x000055d9b9558133 in thd_query_safe(THD*, char*, size_t) at /data/MDEV-32728/src2/sql/sql_class.cc:5256
37      breakpoint     keep y   0x000055d9b955b702 in THD::set_statement(Statement*) at /data/MDEV-32728/src2/sql/sql_class.cc:5924
38      breakpoint     keep y   0x000055d9b955c76a in THD::set_query_and_id(char*, unsigned int, charset_info_st const*, long long) at /data/MDEV-32728/src2/sql/sql_class.cc:6069
39      breakpoint     keep y   0x000055d9b95be449 in handle_delayed_insert(void*) at /data/MDEV-32728/src2/sql/sql_insert.cc:3126
40      breakpoint     keep y   0x000055d9b95d1956 in select_create::send_eof() at /data/MDEV-32728/src2/sql/sql_insert.cc:4990
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 1 time
41      breakpoint     keep y   0x000055d9b96a65d0 in THD::reset_kill_query() at /data/MDEV-32728/src2/sql/sql_class.h:4615
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 5 times
42      breakpoint     keep y   0x000055d9b969b865 in kill_one_thread(THD*, unsigned long long, killed_state, killed_type) at /data/MDEV-32728/src2/sql/sql_parse.cc:9223
43      breakpoint     keep y   0x000055d9b969be88 in kill_threads_callback(THD*, kill_threads_callback_arg*) at /data/MDEV-32728/src2/sql/sql_parse.cc:9317
44      breakpoint     keep y   0x000055d9b9707830 in THD::set_current_linfo(st_log_info*) at /data/MDEV-32728/src2/sql/sql_repl.cc:534
45      breakpoint     keep y   0x000055d9b97078b9 in adjust_callback(THD*, my_off_t*) at /data/MDEV-32728/src2/sql/sql_repl.cc:563
46      breakpoint     keep y   0x000055d9b9707b75 in log_in_use_callback(THD*, char const*) at /data/MDEV-32728/src2/sql/sql_repl.cc:589
47      breakpoint     keep y   0x000055d9b9719a08 in kill_callback(THD*, kill_callback_arg*) at /data/MDEV-32728/src2/sql/sql_repl.cc:3510
48      breakpoint     keep y   0x000055d9b9719b27 in kill_zombie_dump_threads(unsigned int) at /data/MDEV-32728/src2/sql/sql_repl.cc:3524
49      breakpoint     keep y   0x000055d9b98474af in list_callback(THD*, list_callback_arg*) at /data/MDEV-32728/src2/sql/sql_show.cc:2793
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 1 time
50      breakpoint     keep y   0x000055d9b984bbe0 in processlist_callback(THD*, processlist_callback_arg*) at /data/MDEV-32728/src2/sql/sql_show.cc:3238
51      breakpoint     keep y   0x000055d9b9893c22 in thd_get_error_context_description(THD*, char*, unsigned int, unsigned int) at /data/MDEV-32728/src2/sql/sql_show.cc:10382

Because Galera did not seem to be used, I did not set breakpoints on functions that are defined in sql/wsrep*.

If there were a lock order inversion problem, that should be due to some call from InnoDB where lock_sys.wait_mutex is being held. One candidate would be thd_deadlock_victim_preference(), but it is not being called in the trace, and it does not acquire any mutexes. In fact, InnoDB Deadlock::report() was never invoked (to report deadlocks between user transactions). Neither was lock_wait_wsrep_kill(), wsrep_abort_transaction() or lock_wait_wsrep() invoked.

I am convinced that this is a bogus warning. Hopefully someone will find the bug in safe_mutex to confirm this.

Comment by Marko Mäkelä [ 2023-11-08 ]

I repeated the same exercise for THD::LOCK_thd_kill:

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

(rr) i b
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0x000055d9b934b738 in kill_thread(THD*) at /data/MDEV-32728/src2/sql/mysqld.cc:1520
2       breakpoint     keep y   0x000055d9bb3e1839 in print_deadlock_warning at /data/MDEV-32728/src2/mysys/thr_mutex.c:808
	breakpoint already hit 1 time
3       breakpoint     keep y   0x000055d9ba923741 in wsrep_thd_kill_LOCK(THD const*) at /data/MDEV-32728/src2/sql/service_wsrep.cc:46
4       breakpoint     keep y   0x000055d9b93c512e in terminate_slave_thread(THD*, mysql_mutex_t*, mysql_cond_t*, uint volatile*, bool) at /data/MDEV-32728/src2/sql/slave.cc:1038
5       breakpoint     keep y   0x000055d9b95bda8d in kill_delayed_threads() at /data/MDEV-32728/src2/sql/sql_insert.cc:2996
6       breakpoint     keep y   0x000055d9b9d80705 in tp_timeout_handler(TP_connection*) at /data/MDEV-32728/src2/sql/threadpool_common.cc:541
7       breakpoint     keep y   0x000055d9ba8d9c26 in wsrep_handle_mdl_conflict(MDL_context*, MDL_ticket const*, MDL_key const*) at /data/MDEV-32728/src2/sql/wsrep_mysqld.cc:3050
8       breakpoint     keep y   0x000055d9ba8db4c7 in abort_replicated(THD*) at /data/MDEV-32728/src2/sql/wsrep_mysqld.cc:3169
9       breakpoint     keep y   0x000055d9ba8db84e in wsrep_close_thread(THD*) at /data/MDEV-32728/src2/sql/wsrep_mysqld.cc:3217
10      breakpoint     keep y   0x000055d9ba908671 in wsrep_backup_kill_for_commit(THD*) at /data/MDEV-32728/src2/sql/wsrep_thd.cc:475
11      breakpoint     keep y   0x000055d9ba908c6c in wsrep_restore_kill_after_commit(THD*) at /data/MDEV-32728/src2/sql/wsrep_thd.cc:498
12      breakpoint     keep y   0x000055d9b9719a08 in kill_callback(THD*, kill_callback_arg*) at /data/MDEV-32728/src2/sql/sql_repl.cc:3510
13      breakpoint     keep y   0x000055d9b9ae9507 in thd_is_connection_alive(THD*) at /data/MDEV-32728/src2/sql/sql_connect.cc:1348
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 3 times
14      breakpoint     keep y   0x000055d9b969b59f in find_thread_callback(THD*, find_thread_callback_arg*) at /data/MDEV-32728/src2/sql/sql_parse.cc:9193
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
15      breakpoint     keep y   0x000055d9b953d43c in THD::notify_shared_lock(MDL_context_owner*, bool) at /data/MDEV-32728/src2/sql/sql_class.cc:2052
16      breakpoint     keep y   0x000055d9b953e1c0 in THD::reset_killed() at /data/MDEV-32728/src2/sql/sql_class.cc:2153
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
	breakpoint already hit 5 times
17      breakpoint     keep y   0x000055d9b953efe0 in THD::reset_globals() at /data/MDEV-32728/src2/sql/sql_class.cc:2252
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
18      breakpoint     keep y   0x000055d9b955c88c in THD::set_mysys_var(st_my_thread_var*) at /data/MDEV-32728/src2/sql/sql_class.cc:6082
19      breakpoint     keep y   <MULTIPLE>         
	stop only if lock_sys.wait_mutex.m_mutex.mutex.__data.__owner (target evals)
19.1                        y   0x000055d9b953a528 in THD::~THD() at /data/MDEV-32728/src2/sql/sql_class.cc:1685
19.2                        y   0x000055d9b953b404 in THD::~THD() at /data/MDEV-32728/src2/sql/sql_class.cc:1685

There were a few false alarms where lock_sys.wait_mutex would be held by a different thread (Thread 13 committing ALTER TABLE `thash_2_1` REMOVE PARTITIONING or Thread 20 waiting for a statistics table lock in CREATE TABLE IF NOT EXISTS `thash_5_5`).

Comment by Sergei Golubchik [ 2023-11-08 ]

3-way lock order cycle.

  • LOCK_thd_data is locked under LOCK_thd_kill (see e.g. THD constructor)
  • LOCK_thd_kill is locked under lock_sys.wait_mutex (in lock_wait(), when it calls trx_is_interrupted())
  • lock_sys.wait_mutex is locked under LOCK_thd_data (in innobase_kill_query())

There's a shorter 2-way cycle, because innobase_kill_query() is also under LOCK_thd_kill

Comment by Marko Mäkelä [ 2023-11-08 ]

Like I pointed out earlier, this is not really caused by MDEV-32096. The change in MDEV-32096 merely made innobase_kill_query() acquire lock_sys.wait_mutex more often (regardless of the value of trx->lock.wait_lock). Previously, the function lock_sys_t::cancel(trx_t *trx) that innobase_kill_query() invoked would acquire lock_sys.wait_mutex.

serg, I think that fixing this requires changes outside InnoDB. Do you have ideas?

Comment by Marko Mäkelä [ 2023-11-08 ]

I missed the fact that the THD mutexes may also be accessed via pointers.

bb-10.6-release 02e97252982f123979da6a838cd6cd24ae11a932

Thread 2 hit Breakpoint 3, Apc_target::process_apc_requests (this=0x2c1e1e50) at /data/MDEV-32728/src2/sql/my_apc.cc:201
201	{
2: lock_sys.wait_mutex.m_mutex.mutex.__data = {__lock = 1, __count = 0, __owner = 2446594, __nusers = 3, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}
(rr) when
Current event: 425859
(rr) thread find 2446594
Thread 2 has target id 'Thread 2446518.2446594'
(rr) bt
#0  Apc_target::process_apc_requests (this=0x2c1e1e50) at /data/MDEV-32728/src2/sql/my_apc.cc:201
#1  0x000055d9b95552f1 in thd_kill_level (thd=0x62b0000d9218) at /data/MDEV-32728/src2/sql/sql_class.cc:4729
#2  0x000055d9ba97aab4 in trx_is_interrupted (trx=0x5f5473c88140) at /data/MDEV-32728/src2/storage/innobase/handler/ha_innodb.cc:3582
#3  0x000055d9bab2f995 in lock_wait (thr=0x61b000c6bd20) at /data/MDEV-32728/src2/storage/innobase/lock/lock0lock.cc:1966
#4  0x000055d9bad54f39 in row_mysql_handle_errors (new_err=0x57b72c1e2920, trx=0x5f5473c88140, thr=0x61b000c6bd20, savept=0x0) at /data/MDEV-32728/src2/storage/innobase/row/row0mysql.cc:688
#5  0x000055d9bab3e94c in lock_table_for_trx (table=0x618000569d20, trx=0x5f5473c88140, mode=LOCK_X, no_wait=false) at /data/MDEV-32728/src2/storage/innobase/lock/lock0lock.cc:3969
#6  0x000055d9bb08ddd8 in dict_stats_save (table_orig=0x6180003b8920, only_for_index=0x0) at /data/MDEV-32728/src2/storage/innobase/dict/dict0stats.cc:3312

MDEV-32530 added this call to trx_is_interrupted() while the thread is holding an InnoDB latch.

Comment by Marko Mäkelä [ 2023-11-08 ]

I think that the following (untested) patch should avoid the 3-way lock order inversion:

diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
index df51ceb16d8..4e6784b6c3e 100644
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -1948,10 +1948,18 @@ dberr_t lock_wait(que_thr_t *thr)
      wait_lock->un_member.tab_lock.table->id <= DICT_FIELDS_ID);
   thd_wait_begin(trx->mysql_thd, (type_mode & LOCK_TABLE)
                  ? THD_WAIT_TABLE_LOCK : THD_WAIT_ROW_LOCK);
+  /* Dictionary transactions must ignore KILL, because they could
+  be executed as part of a multi-transaction DDL operation,
+  such as rollback_inplace_alter_table() or ha_innobase::delete_table().
+
+  Invoking trx_is_interrupted() while holding lock_sys.wait_mutex
+  would lead to lock order inversion with respect to
+  THD::LOCK_thd_data and THD::LOCK_thd_kill. */
+  const bool interrupted{!trx->dict_operation && trx_is_interrupted(trx)};
 
   mysql_mutex_lock(&lock_sys.wait_mutex);
   /* Now that we are holding lock_sys.wait_mutex, we must reload
-  trx->lock.wait_mutex. It cannot be cleared as long as we are holding
+  trx->lock.wait_lock. It cannot be cleared as long as we are holding
   lock_sys.wait_mutex, but as long as we do not hold exclusive
   lock_sys.latch, a waiting record lock can be replaced with an
   equivalent waiting record lock during a page split or merge by
@@ -1960,10 +1968,7 @@ dberr_t lock_wait(que_thr_t *thr)
 
   if (wait_lock)
   {
-    /* Dictionary transactions must ignore KILL, because they could
-    be executed as part of a multi-transaction DDL operation,
-    such as rollback_inplace_alter_table() or ha_innobase::delete_table(). */
-    if (!trx->dict_operation && trx_is_interrupted(trx))
+    if (interrupted)
     {
       /* innobase_kill_query() can only set trx->error_state=DB_INTERRUPTED
       for any transaction that is attached to a connection.

Comment by Marko Mäkelä [ 2023-11-08 ]

In the end, the problem may exist already before MDEV-32096, because lock_wait() was already invoking trx_is_interrupted() elsewhere while holding lock_sys.wait_mutex.

I think that a simpler fix would be to make these particular calls or all calls of trx_is_interrupted() pass a special parameter to thd_kill_level() that would make it skip the problematic call to apc_target->process_apc_requests(). It would not affect the outcome of "this statement or connection was killed".

Comment by Marko Mäkelä [ 2023-11-08 ]

To avoid the other possible case of lock order inversion, we could do the following:

@@ -2049,11 +2054,22 @@ dberr_t lock_wait(que_thr_t *thr)
       /* Dictionary transactions must ignore KILL, because they could
       be executed as part of a multi-transaction DDL operation,
       such as rollback_inplace_alter_table() or ha_innobase::delete_table(). */
-      if (!trx->dict_operation && trx_is_interrupted(trx))
+      if (!trx->dict_operation)
+      {
+        /* Temporarily release the mutex to avoid lock order inversion. */
+        mysql_mutex_unlock(&lock_sys.wait_mutex);
+        const bool interrupted{trx_is_interrupted(trx)};
+        mysql_mutex_lock(&lock_sys.wait_mutex);
+        wait_lock= trx->lock.wait_lock;
         /* innobase_kill_query() can only set trx->error_state=DB_INTERRUPTED
         for any transaction that is attached to a connection. */
-        trx->error_state= DB_INTERRUPTED;
-      else if (!err)
+        if (interrupted)
+        {
+          trx->error_state= DB_INTERRUPTED;
+          break;
+        }
+      }
+      if (!err)
         continue;
 #ifdef WITH_WSREP
       else if (trx->is_wsrep() && wsrep_is_BF_lock_timeout(*trx));

However, this could introduce a measurable performance regression, by increasing the rate at which the rather contended lock_sys.wait_mutex is being acquired and released.

It could also be the case that because this thread already called trx_is_interrupted(), on the second call apc_target->have_apc_requests() would be guaranteed to not hold. If that is the case, we would definitely not need this additional patch.

Comment by Kristian Nielsen [ 2023-11-08 ]

The patch moves the check of trx_is_interrupted() to before taking the lock_sys.wait_mutex.m_mutex.
But checking for kill always needs to be done while holding the mutex that's being used in pthread_cond_wait:

      err= my_cond_timedwait(&trx->lock.cond, &lock_sys.wait_mutex.m_mutex, &abstime);

Otherwise the kill may occur just after the check and be ignored, and the kill is lost.
This will break parallel replication, as deadlocks remain unhandled and the slave will hang.

It sems necessary to be able to call thd_kill_level() while holding other mutex, as this is a requirement for correctly handling kill.

I understand the convenience of trying to run pending apc from thd_kill_level(), since it is something we can expect to be called frequently. But it doesn't seem safe.

What about using mysql_mutex_trylock() in thd_kill_level(), and only running the apc if the lock can be obtained?
The request can then be dequeued under LOCK_thd_kill. Then the apc should be run while temporarily unlocking LOCK_thd_kill. Update: No, the lock cannot be released while running the apc, it protects the lifetime of the request. But that might be ok, just a retriction on which mutexes can be taken in an apc.

Comment by Kristian Nielsen [ 2023-11-08 ]

Here an RFC patch for the above idea:

https://github.com/MariaDB/server/commit/d35f30214a2a1da0f5dbc6e69d1bd9a3a5e98b06

Comment by Sergei Golubchik [ 2023-11-08 ]

Thanks! Looks good, I'll apply and push it, if you don't mind

Comment by Kristian Nielsen [ 2023-11-08 ]

Sure, please do.
Meanwhile, I'll see if I can come up with a quick testcase, depends on how tricky it is to get the threads coordinated.

Comment by Kristian Nielsen [ 2023-11-08 ]

Pushed a testcase to knielsen_mdev32728:

https://github.com/MariaDB/server/commit/2d0eb0ddfc13d7b5a788450a281a3c5b5854482e

Generated at Thu Feb 08 10:33:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.