Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.10(EOL), 10.11
-
None
Description
trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().
The bug can be reproduced with the following new sync point:
diff --git a/storage/innobase/trx/trx0i_s.cc b/storage/innobase/trx/trx0i_s.cc
|
index 2dc39118d3d..c2cc8c970b0 100644 |
--- a/storage/innobase/trx/trx0i_s.cc
|
+++ b/storage/innobase/trx/trx0i_s.cc
|
@@ -461,6 +461,8 @@ fill_trx_row( |
row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd);
|
|
char query[TRX_I_S_TRX_QUERY_MAX_LEN + 1]; |
+ ut_d(if (trx->state == TRX_STATE_PREPARED) |
+ DEBUG_SYNC_C("fill_trx_row_before_query_request")); |
if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query, |
sizeof query)) {
|
row->trx_query = static_cast<const char*>( |
diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc
|
index 3b19d213d5a..92bf9de375a 100644 |
--- a/storage/innobase/trx/trx0trx.cc
|
+++ b/storage/innobase/trx/trx0trx.cc
|
@@ -550,6 +550,7 @@ void trx_disconnect_prepared(trx_t *trx) |
trx->read_view.close();
|
trx->is_recovered= true; |
trx->mysql_thd= NULL;
|
+ DEBUG_SYNC_C("trx_disconnect_prepared_reset_thd"); |
/* todo/fixme: suggest to do it at innodb prepare */ |
trx->will_lock= false; |
trx_sys.rw_trx_hash.put_pins(trx);
|
and the following test case:
--source include/have_innodb.inc
|
--source include/have_debug.inc
|
--source include/have_debug_sync.inc
|
--source include/count_sessions.inc
|
|
--connection default |
create table t (a int) engine=innodb; |
insert into t values(1); |
|
--connect (con_xa, localhost, root,,)
|
SET DEBUG_SYNC="trx_disconnect_prepared_reset_thd SIGNAL thd_reset"; |
xa start '1'; |
insert into t values(1); |
xa end '1'; |
xa prepare '1'; |
|
--connection default |
SET DEBUG_SYNC="fill_trx_row_before_query_request SIGNAL reached WAIT_FOR fill_row_cont"; |
--send select * from information_schema.innodb_trx;
|
|
--connect (con_sync, localhost, root,,)
|
SET DEBUG_SYNC="now WAIT_FOR reached"; |
--disconnect con_xa
|
SET DEBUG_SYNC="now WAIT_FOR thd_reset"; |
SET DEBUG_SYNC="now SIGNAL fill_row_cont"; |
--disconnect con_sync
|
|
--connection default |
--disable_result_log
|
# Must crash here with SIGSEGV if not fixed |
--reap;
|
--enable_result_log
|
xa commit '1'; |
drop table t;
|
SET DEBUG_SYNC="RESET"; |
--source include/wait_until_count_sessions.inc
|
It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details).
Until MDEV-29368 is fixed the workaround is not to use innodb_trx, innodb_locks and innodb_lock_waits from information_schema along with detached XA's.
Attachments
Issue Links
- blocks
-
MDEV-28709 unexpected X lock on Supremum in READ COMMITTED
-
- Closed
-
- is blocked by
-
MDEV-29368 Assertion `trx->mysql_thd == thd' failed in innobase_kill_query from process_timers/timer_handler and use-after-poison in innobase_kill_query
-
- Closed
-
- is caused by
-
MDEV-15773 Simplify away trx_sys_t::m_views
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue is blocked by |
Link | This issue relates to MENT-1632 [ MENT-1632 ] |
Link | This issue relates to MENT-1632 [ MENT-1632 ] |
Link | This issue is duplicated by MENT-1632 [ MENT-1632 ] |
Assignee | Vladislav Lesin [ vlad.lesin ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue blocks |
Status | Open [ 1 ] | In Progress [ 3 ] |
Description |
trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().
The bug can be reproduced with the following new sync point: {code:java} --- a/storage/innobase/trx/trx0i_s.cc +++ b/storage/innobase/trx/trx0i_s.cc @@ -457,6 +457,7 @@ fill_trx_row( row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd); char query[TRX_I_S_TRX_QUERY_MAX_LEN + 1]; + DEBUG_SYNC_C("fill_trx_row_before_query_safe"); if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query, sizeof query)) { row->trx_query = static_cast<const char*>( {code} and the following test case: {code:java} --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc --source include/count_sessions.inc create table t1 (a int) engine=innodb; insert into t1 values(1); --connect (con_xa, localhost, root,,) xa start '1'; insert into t1 values(1); xa end '1'; xa prepare '1'; --connection default SET DEBUG_SYNC="fill_trx_row_before_query_safe SIGNAL reached WAIT_FOR cont"; --send select * from information_schema.innodb_trx; --connect (con_sync, localhost, root,,) SET DEBUG_SYNC="now WAIT_FOR reached"; --disconnect con_xa SET DEBUG_SYNC="now SIGNAL cont"; --disconnect con_sync --connection default # Must crash here with SIGSEGV if not fixed --reap; xa commit '1'; drop table t; --source include/wait_until_count_sessions.inc {code} Note the above test case can be unstable, as "fill_trx_row_before_query_safe sync" point must wait until trx_disconnect_prepared() zeroes out trx->mysql_thd. It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details). Until |
trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().
The bug can be reproduced with the following new sync point: {code:java} diff --git a/storage/innobase/trx/trx0i_s.cc b/storage/innobase/trx/trx0i_s.cc index 2dc39118d3d..c2cc8c970b0 100644 --- a/storage/innobase/trx/trx0i_s.cc +++ b/storage/innobase/trx/trx0i_s.cc @@ -461,6 +461,8 @@ fill_trx_row( row->trx_mysql_thread_id = thd_get_thread_id(trx->mysql_thd); char query[TRX_I_S_TRX_QUERY_MAX_LEN + 1]; + ut_d(if (trx->state == TRX_STATE_PREPARED) + DEBUG_SYNC_C("fill_trx_row_before_query_request")); if (size_t stmt_len = thd_query_safe(trx->mysql_thd, query, sizeof query)) { row->trx_query = static_cast<const char*>( diff --git a/storage/innobase/trx/trx0trx.cc b/storage/innobase/trx/trx0trx.cc index 3b19d213d5a..92bf9de375a 100644 --- a/storage/innobase/trx/trx0trx.cc +++ b/storage/innobase/trx/trx0trx.cc @@ -550,6 +550,7 @@ void trx_disconnect_prepared(trx_t *trx) trx->read_view.close(); trx->is_recovered= true; trx->mysql_thd= NULL; + DEBUG_SYNC_C("trx_disconnect_prepared_reset_thd"); /* todo/fixme: suggest to do it at innodb prepare */ trx->will_lock= false; trx_sys.rw_trx_hash.put_pins(trx); {code} and the following test case: {code:java} --source include/have_innodb.inc --source include/have_debug.inc --source include/have_debug_sync.inc --source include/count_sessions.inc --connection default create table t (a int) engine=innodb; insert into t values(1); --connect (con_xa, localhost, root,,) SET DEBUG_SYNC="trx_disconnect_prepared_reset_thd SIGNAL thd_reset"; xa start '1'; insert into t values(1); xa end '1'; xa prepare '1'; --connection default SET DEBUG_SYNC="fill_trx_row_before_query_request SIGNAL reached WAIT_FOR fill_row_cont"; --send select * from information_schema.innodb_trx; --connect (con_sync, localhost, root,,) SET DEBUG_SYNC="now WAIT_FOR reached"; --disconnect con_xa SET DEBUG_SYNC="now WAIT_FOR thd_reset"; SET DEBUG_SYNC="now SIGNAL fill_row_cont"; --disconnect con_sync --connection default --disable_result_log # Must crash here with SIGSEGV if not fixed --reap; --enable_result_log xa commit '1'; drop table t; SET DEBUG_SYNC="RESET"; --source include/wait_until_count_sessions.inc {code} It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details). Until |
Summary | access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV | Access to innodb_trx, innodb_locks and innodb_lock_waits along with detached XA's can cause SIGSEGV |
Link |
This issue is caused by |
Fix Version/s | 10.3 [ 22126 ] |
Fix Version/s | 10.3.37 [ 28404 ] | |
Fix Version/s | 10.4.27 [ 28405 ] | |
Fix Version/s | 10.5.18 [ 28421 ] | |
Fix Version/s | 10.6.11 [ 28441 ] | |
Fix Version/s | 10.7.7 [ 28442 ] | |
Fix Version/s | 10.8.6 [ 28443 ] | |
Fix Version/s | 10.9.4 [ 28444 ] | |
Fix Version/s | 10.10.2 [ 28410 ] | |
Fix Version/s | 10.11.1 [ 28454 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Zendesk Related Tickets | 147679 |
Useful comment from marko:
A possible fix might be to acquire THD::LOCK_thd_data in a safe way, and to ensure that a disconnect would be blocked it. Currently it is not the case; see
MDEV-29368.The function fill_trx_row() does check if trx_t::mysql_thd is a null pointer, but race conditions after that point are possible. Holding exclusive lock_sys.latch (or before 10.6, lock_sys.mutex) will block lock_release() during trx_t::release_locks() but not the transaction state change.
When it comes to race conditions with innobase_close_connection(), there is no way to synchronize with trx_disconnect_prepared(). The innobase_rollback_trx() would be blocked by lock_sys.latch and trx_t::mutex.