Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2.18
-
None
-
Ubuntu 17.04 but most probably unimportant.
Description
RQG testing with the test conf/mariadb/tables_stress* on
|
bb-10.2-marko commit 0aa9db1b4877ed411eae4175c059546fa07faa4f
|
compiled with ASAN
|
|
#3 <signal handler called>
|
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
|
#5 0x00007f8f6062737a in __GI_abort () at abort.c:89
|
#6 0x00007f8f61db9169 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.3
|
#7 0x00007f8f61dae06b in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.3
|
#8 0x00007f8f61da7457 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.3
|
#9 0x00007f8f61da7b47 in __asan_report_load1 () from /usr/lib/x86_64-linux-gnu/libasan.so.3
|
#10 0x000055ad8d70277d in lock_trx_handle_wait_low (trx=0x7f8f55009778)
|
at storage/innobase/lock/lock0lock.cc:6926
|
#11 0x000055ad8d702948 in lock_trx_handle_wait (trx=0x7f8f55009778)
|
at storage/innobase/lock/lock0lock.cc:6950
|
#12 0x000055ad8d5daa5b in innobase_kill_query (thd=0x62a0001e0208)
|
at storage/innobase/handler/ha_innodb.cc:5249
|
#13 0x000055ad8d0f4dc5 in kill_handlerton (thd=0x62a0001e0208, plugin=0x60200002e618, level=0x7f8f2d52d8c4)
|
at sql/handler.cc:808
|
#14 0x000055ad8cb695fc in plugin_foreach_with_mask (thd=0x62a0001e0208, func=0x55ad8d0f4c96 <kill_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7f8f2d52d8c4)
|
at sql/sql_plugin.cc:2398
|
#15 0x000055ad8d0f4e99 in ha_kill_query (thd=0x62a0001e0208, level=THD_ABORT_ASAP)
|
at sql/handler.cc:815
|
#16 0x000055ad8ca7cdbc in THD::awake (this=0x62a0001e0208, state_to_set=KILL_CONNECTION_HARD)
|
at sql/sql_class.cc:1710
|
#17 0x000055ad8cb4dce7 in kill_one_thread (thd=0x62a0001bc208, id=135, kill_signal=KILL_QUERY_HARD, type=KILL_TYPE_ID)
|
at sql/sql_parse.cc:8897
|
#18 0x000055ad8cb4e669 in sql_kill (thd=0x62a0001bc208, id=135, state=KILL_QUERY_HARD, type=KILL_TYPE_ID)
|
at sql/sql_parse.cc:9003
|
#19 0x000055ad8cb38361 in mysql_execute_command (thd=0x62a0001bc208)
|
at sql/sql_parse.cc:5472
|
#20 0x000055ad8cb47f28 in mysql_parse (thd=0x62a0001bc208, rawbuf=0x62b0000c4220 "KILL QUERY @kill_id /* E_R Thread31 QNO 771 CON_ID 288 */", length=58, parser_state=0x7f8f2d52f560, is_com_multi=false, is_next_command=false)
|
at sql/sql_parse.cc:8009
|
#21 0x000055ad8cb22c46 in dispatch_command (command=COM_QUERY, thd=0x62a0001bc208, packet=0x629003886209 " KILL QUERY @kill_id /* E_R Thread31 QNO 771 CON_ID 288 */ ", packet_length=60, is_com_multi=false, is_next_command=false)
|
at sql/sql_parse.cc:1824
|
#22 0x000055ad8cb1fce7 in do_command (thd=0x62a0001bc208)
|
at sql/sql_parse.cc:1377
|
#23 0x000055ad8ce50fc2 in do_handle_one_connection (connect=0x60800000b2a8)
|
at sql/sql_connect.cc:1335
|
#24 0x000055ad8ce509a7 in handle_one_connection (arg=0x60800000b2a8)
|
at sql/sql_connect.cc:1241
|
#25 0x00007f8f612666da in start_thread (arg=0x7f8f2d531700) at pthread_create.c:456
|
|
SUMMARY: AddressSanitizer: use-after-poison /work_m/bb-10.2-marko/storage/innobase/lock/lock0lock.cc:6926 in lock_trx_handle_wait_low
|
Shadow bytes around the buggy address:
|
0x0ff26a9f92b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
0x0ff26a9f92c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
0x0ff26a9f92d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
0x0ff26a9f92e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
0x0ff26a9f92f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 f7 f7 f7
|
=>0x0ff26a9f9300: f7 f7 f7 f7 f7 f7 f7 f7 f7[f7]f7 f7 f7 f7 f7 f7
|
0x0ff26a9f9310: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
|
0x0ff26a9f9320: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
|
0x0ff26a9f9330: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
|
0x0ff26a9f9340: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
|
0x0ff26a9f9350: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
|
Attachments
Issue Links
- duplicates
-
MDEV-8088 [Draft] Server crashed on KILL QUERY
-
- Closed
-
- is duplicated by
-
MDEV-22670 AddressSanitizer: use-after-poison 10.2/storage/innobase/lock/lock0lock.cc:6557 in lock_trx_handle_wait_low
-
- Closed
-
- relates to
-
MDEV-23328 Server hang due to Galera lock conflict resolution
-
- Closed
-
-
MDEV-23536 Race condition between KILL and transaction commit
-
- Closed
-
-
MDEV-26014 Race condition between KILL and client disconnect
-
- Closed
-
-
MDEV-29368 Assertion `trx->mysql_thd == thd' failed in innobase_kill_query from process_timers/timer_handler and use-after-poison in innobase_kill_query
-
- Closed
-
Sad to see 5530a93f4. My comments are based on 10.5.
First of all it doesn't really fix the broader problem, but rather silences symptoms.
+ if (trx_t* trx= thd_to_trx(thd))
+ {
---- THE GAP ---
+ lock_mutex_enter();
+ trx_sys_mutex_enter();
While going through THE GAP concurrent thread may release (or even free?) trx back to trx_pool and another thread may reuse it. In effect it may break innocent transaction, even worse it may break innocent transaction of another user which it is not permitted to kill.
Secondly, trx is found via server_threads/thd->ha_data and not via trx_sys.trx_list. The latter is officially protected by trx_sys.mutex, the former is not. The fact that trx_sys.mutex helps here is pure coincidence and can in no way be considered as meaningful design.
Thirdly, the bug is in the server and not in InnoDB. Consider the following...
KILL thread:
rdlock(server_threads.rwlock);
thd= server_threads.find();
lock(thd->LOCK_thd_kill);
unlock(server_thread.rwlock);
thd->awake()/.../innobase_kill_query();
unlock(thd->LOCK_thd_kill);
InnoDB thd->ha_data is reset only upon disconnect, in ha_close_connection(). Relevant pseudo code:
wrlock(server_thread.rwlock);
server_threads.erase(thd);
unlock(server_thread.rwlock);
thd->free_connection()/ha_close_connection()/innobase_close_connection()/trx_free()
If you look carefully into the above, you can conclude that thd->free_connection() can be called concurrently with KILL/thd->awake(). Which is the bug. And it is partially fixed in THD::~THD(), that is destructor waits for KILL completion:
/*
Other threads may have a lock on LOCK_thd_kill to ensure that this
THD is not deleted while they access it. The following mutex_lock
ensures that no one else is using this THD and it's now safe to delete
*/
if (WSREP_NNULL(this)) mysql_mutex_lock(&LOCK_thd_data);
mysql_mutex_lock(&LOCK_thd_kill);
mysql_mutex_unlock(&LOCK_thd_kill);
if (WSREP_NNULL(this)) mysql_mutex_unlock(&LOCK_thd_data);
However this only works when THD is actually destroyed, whereas it must've been called when THD is released to the thread cache.
If the above empty critical section would've been moved to thd->free_connection(), the whole 5530a93f4 would not be needed.