[MDEV-26014] Race condition between KILL and client disconnect Created: 2021-06-24  Updated: 2022-11-22  Resolved: 2022-09-29

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB
Affects Version/s: 10.5.12, 10.5.13, 10.6
Fix Version/s: 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Sergei Golubchik
Resolution: Duplicate Votes: 1
Labels: ASAN, race, rr-profile-analyzed

Issue Links:
Duplicate
duplicates MDEV-29368 Assertion `trx->mysql_thd == thd' fai... Closed
Relates
relates to MDEV-24571 ASAN failure during KILL CONNECTION Open
relates to MDEV-17092 ERROR: AddressSanitizer: use-after-po... Closed
relates to MDEV-23536 Race condition between KILL and trans... Closed

 Description   

We got a crash like this:

10.6 101da87228f11a1d7582f5dccfda7f30ac3b5668

==2645651==ERROR: AddressSanitizer: use-after-poison on address 0x1b9a262b3db0 at pc 0x5583f42fdaeb bp 0x20cf03661100 sp 0x20cf036610f0
READ of size 8 at 0x1b9a262b3db0 thread T44
2021-06-24  4:15:01 240 [Warning] Aborted connection 240 to db: 'test' user: 'root' host: 'localhost' (This connection closed normally)
    #0 0x5583f42fdaea in innobase_kill_query /data/Server/10.6D/storage/innobase/handler/ha_innodb.cc:4895
    #1 0x5583f38ef057 in kill_handlerton /data/Server/10.6D/sql/handler.cc:933
    #2 0x5583f30e4dce in plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int**, void*), int, unsigned int, void*) /data/Server/10.6D/sql/sql_plugin.cc:2510
    #3 0x5583f38f9c2d in ha_kill_query(THD*, thd_kill_levels) /data/Server/10.6D/sql/handler.cc:940
    #4 0x5583f2f7a2cc in THD::awake_no_mutex(killed_state) /data/Server/10.6D/sql/sql_class.cc:1925
    #5 0x5583f309484a in kill_one_thread(THD*, long long, killed_state, killed_type) /data/Server/10.6D/sql/sql_parse.cc:9256
Shadow bytes around the buggy address:
=>0x0373c4c4e7b0: 00 f7 04 f7 f7 f7[f7]f7 f7 04 00 f7 f7 f7 f7 f7

ssh sdp
rr replay /data/Results/1624532926/TBR-975/dev/shm/vardir/1624532926/5/1/rr/latest-trace

(rr) continue
Thread 3 received signal SIGABRT, Aborted.
(rr) watch *(char*)0x0373c4c4e7b6
Hardware watchpoint 1: *(char*)0x0373c4c4e7b6
(rr) when
Current event: 562471
(rr) reverse-continue
Continuing.
[Switching to Thread 2645651.2646885]
 
Thread 34 hit Hardware watchpoint 1: *(char*)0x0373c4c4e7b6
 
Old value = -9 '\367'
New value = 0 '\000'
__memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:244
244	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory.
(rr) thr a 3 bt
 
Thread 3 (Thread 2645651.2647440):
#0  __lll_lock_wait (futex=futex@entry=0x5583f6dda640 <THR_LOCK_mutex>, private=0) at lowlevellock.c:52
#1  0x00003424383dd235 in __GI___pthread_mutex_lock (mutex=0x5583f6dda640 <THR_LOCK_mutex>) at ../nptl/pthread_mutex_lock.c:135
#2  0x00005583f4e348c0 in safe_mutex_lock (mp=mp@entry=0x5583f6dda2a0 <LOCK_alarm>, my_flags=my_flags@entry=0, file=file@entry=0x5583f58e4f40 "/data/Server/10.6D/mysys/thr_alarm.c", line=line@entry=455) at /data/Server/10.6D/mysys/thr_mutex.c:333
#3  0x00005583f4e2e459 in inline_mysql_mutex_lock (src_line=455, src_file=0x5583f58e4f40 "/data/Server/10.6D/mysys/thr_alarm.c", that=0x5583f6dda2a0 <LOCK_alarm>) at /data/Server/10.6D/include/mysql/psi/mysql_thread.h:750
#4  thr_alarm_kill (thread_id=107) at /data/Server/10.6D/mysys/thr_alarm.c:455
#5  0x00005583f2f7a53e in THD::awake_no_mutex (this=this@entry=0x62b000230218, state_to_set=state_to_set@entry=KILL_CONNECTION) at /data/Server/10.6D/sql/sql_class.cc:1916
#6  0x00005583f309484b in kill_one_thread (thd=thd@entry=0x62b0002df218, id=id@entry=107, kill_signal=kill_signal@entry=KILL_CONNECTION, type=type@entry=KILL_TYPE_ID) at /data/Server/10.6D/sql/sql_parse.cc:9256
#7  0x00005583f3095320 in sql_kill (thd=thd@entry=0x62b0002df218, id=107, state=state@entry=KILL_CONNECTION, type=type@entry=KILL_TYPE_ID) at /data/Server/10.6D/sql/sql_parse.cc:9378
#8  0x00005583f30d1e02 in mysql_execute_command (thd=thd@entry=0x62b0002df218, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/10.6D/sql/sql_parse.cc:5530
#9  0x00005583f308ae69 in mysql_parse (thd=thd@entry=0x62b0002df218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x20cf03662390) at /data/Server/10.6D/sql/sql_parse.cc:8028
#10 0x00005583f30b8f51 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0002df218, packet=packet@entry=0x6290010c7219 " KILL SOFT CONNECTION @kill_id /* E_R Thread31 QNO 741 CON_ID 131 */ ", packet_length=packet_length@entry=69, blocking=blocking@entry=true) at /data/Server/10.6D/sql/sql_class.h:1337
#11 0x00005583f30c114b in do_command (thd=0x62b0002df218, blocking=blocking@entry=true) at /data/Server/10.6D/sql/sql_parse.cc:1406
#12 0x00005583f34f9777 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x6080000b8138, put_in_cache=put_in_cache@entry=true) at /data/Server/10.6D/sql/sql_connect.cc:1410
#13 0x00005583f34fa325 in handle_one_connection (arg=0x6080000b8138) at /data/Server/10.6D/sql/sql_connect.cc:1312
#14 0x00003424383da609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007ffe871f7293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) bt
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:244
#1  0x000038c565d8f022 in __asan_poison_memory_region () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#2  0x00005583f48d2d58 in trx_t::free (this=this@entry=0x1b9a262b2de8) at /data/Server/10.6D/storage/innobase/trx/trx0trx.cc:422
#3  0x00005583f42f6edf in innobase_close_connection (hton=<optimized out>, thd=0x62b000230218) at /data/Server/10.6D/storage/innobase/handler/ha_innodb.cc:4882
#4  0x00005583f38f9a89 in ha_close_connection (thd=thd@entry=0x62b000230218) at /data/Server/10.6D/sql/handler.cc:916
#5  0x00005583f2f78b37 in THD::free_connection (this=this@entry=0x62b000230218) at /data/Server/10.6D/sql/sql_class.cc:1624
#6  0x00005583f2dc8aee in unlink_thd (thd=0x62b000230218) at /data/Server/10.6D/sql/mysqld.cc:2693
#7  0x00005583f34f9caa in do_handle_one_connection (connect=<optimized out>, connect@entry=0x608000003b38, put_in_cache=put_in_cache@entry=true) at /data/Server/10.6D/sql/sql_connect.cc:1421
#8  0x00005583f34fa325 in handle_one_connection (arg=0x608000003b38) at /data/Server/10.6D/sql/sql_connect.cc:1312
#9  0x00003424383da609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#10 0x00007ffe871f7293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

At the time of the crash, the crashing Thread 3 would have proceeded to the following:

#6  0x000038c565d941ab in __asan_report_load8 () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#7  0x00005583f42fdaeb in innobase_kill_query (thd=0x62b000230218) at /usr/include/c++/9/bits/atomic_base.h:734
#8  0x00005583f38ef058 in kill_handlerton (thd=0x62b000230218, plugin=<optimized out>, level=0x20cf03661320) at /data/Server/10.6D/sql/handler.cc:933
#9  0x00005583f30e4dcf in plugin_foreach_with_mask (thd=thd@entry=0x62b000230218, func=func@entry=0x5583f38eef15 <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, 
    arg=arg@entry=0x20cf03661320) at /data/Server/10.6D/sql/sql_plugin.cc:2510
#10 0x00005583f38f9c2e in ha_kill_query (thd=thd@entry=0x62b000230218, level=<optimized out>) at /data/Server/10.6D/sql/handler.cc:940
#11 0x00005583f2f7a2cd in THD::awake_no_mutex (this=this@entry=0x62b000230218, state_to_set=state_to_set@entry=KILL_CONNECTION) at /data/Server/10.6D/sql/sql_class.cc:1925
#12 0x00005583f309484b in kill_one_thread (thd=thd@entry=0x62b0002df218, id=id@entry=107, kill_signal=kill_signal@entry=KILL_CONNECTION, type=type@entry=KILL_TYPE_ID) at /data/Server/10.6D/sql/sql_parse.cc:9256
#13 0x00005583f3095320 in sql_kill (thd=thd@entry=0x62b0002df218, id=107, state=state@entry=KILL_CONNECTION, type=type@entry=KILL_TYPE_ID) at /data/Server/10.6D/sql/sql_parse.cc:9378
#14 0x00005583f30d1e02 in mysql_execute_command (thd=thd@entry=0x62b0002df218, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/10.6D/sql/sql_parse.cc:5530
#15 0x00005583f308ae69 in mysql_parse (thd=thd@entry=0x62b0002df218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x20cf03662390) at /data/Server/10.6D/sql/sql_parse.cc:8028
#16 0x00005583f30b8f51 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0002df218, 
    packet=packet@entry=0x6290010c7219 " KILL SOFT CONNECTION @kill_id /* E_R Thread31 QNO 741 CON_ID 131 */ ", packet_length=packet_length@entry=69, blocking=blocking@entry=true)
    at /data/Server/10.6D/sql/sql_class.h:1337

I suspect that this affects earlier versions as well.

Note: There already is an ASAN error report about KILL QUERY, with RQG grammar, in MDEV-24571, affecting 10.2.

Also note that InnoDB used to work around this kind of races in MDEV-17092, but that work-around was removed in MDEV-23536 when the problem was supposedly fixed.



 Comments   
Comment by Matthias Leich [ 2021-06-24 ]

The failure was observed on
origin/10.5 60ed4797112533a88b11ed450290a0fd48f0ecb8 2021-06-24T11:01:18+03:00
too.
 
Error pattern for RQG
[ 'TBR-975-MDEV-26014', 'ERROR: AddressSanitizer: use-after-poison.{1,300}READ of size .+#0 .{1,30} in innobase_kill_query.+#1 .{1,30}in kill_handlerton.+#2 .{1,30}in plugin_foreach_with_mask.+#3 .{1,30}in ha_kill_query.+#4 .{1,30}in THD::awake_no_mutex.+#5 .{1,30}in kill_one_thread' ]

Comment by Sergei Golubchik [ 2022-09-27 ]

how can I repeat this issue?

Comment by Marko Mäkelä [ 2022-09-28 ]

serg, a starting point could be the RQG grammar in an earlier ticket. Did you try it? Quoting the Description:

Note: There already is an ASAN error report about KILL QUERY, with RQG grammar, in MDEV-24571, affecting 10.2.

Comment by Sergei Golubchik [ 2022-09-28 ]

I'm trying to fix this bug, not MDEV-24571. If this is the same issue, I'll close it as a duplicate.

Generated at Thu Feb 08 09:42:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.