trx->mysql_thd can be zeroed-out between thd_get_thread_id() and thd_query_safe() calls in fill_trx_row(). trx_disconnect_prepared() zeroes out trx->mysql_thd. And this can cause null pointer dereferencing in fill_trx_row().
The bug can be reproduced with the following new sync point:
/* todo/fixme: suggest to do it at innodb prepare */
trx->will_lock= false;
trx_sys.rw_trx_hash.put_pins(trx);
and the following test case:
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
--source include/count_sessions.inc
--connection default
create table t (a int) engine=innodb;
insert into t values(1);
--connect (con_xa, localhost, root,,)
SET DEBUG_SYNC="trx_disconnect_prepared_reset_thd SIGNAL thd_reset";
xa start '1';
insert into t values(1);
xa end '1';
xa prepare '1';
--connection default
SET DEBUG_SYNC="fill_trx_row_before_query_request SIGNAL reached WAIT_FOR fill_row_cont";
--send select * from information_schema.innodb_trx;
--connect (con_sync, localhost, root,,)
SET DEBUG_SYNC="now WAIT_FOR reached";
--disconnect con_xa
SET DEBUG_SYNC="now WAIT_FOR thd_reset";
SET DEBUG_SYNC="now SIGNAL fill_row_cont";
--disconnect con_sync
--connection default
--disable_result_log
# Must crash here with SIGSEGV if not fixed
--reap;
--enable_result_log
xa commit '1';
drop table t;
SET DEBUG_SYNC="RESET";
--source include/wait_until_count_sessions.inc
It does not affect 10.3 as 10.3 does not detach XA on disconnection (compare THD::cleanup() in 10.3 and 10.4+ and see trans_xa_detach() in 10.4+ for details).
Until MDEV-29368 is fixed the workaround is not to use innodb_trx, innodb_locks and innodb_lock_waits from information_schema along with detached XA's.
Attachments
Issue Links
blocks
MDEV-28709unexpected X lock on Supremum in READ COMMITTED
Closed
is blocked by
MDEV-29368Assertion `trx->mysql_thd == thd' failed in innobase_kill_query from process_timers/timer_handler and use-after-poison in innobase_kill_query
A possible fix might be to acquire THD::LOCK_thd_data in a safe way, and to ensure that a disconnect would be blocked it. Currently it is not the case; see MDEV-29368.
The function fill_trx_row() does check if trx_t::mysql_thd is a null pointer, but race conditions after that point are possible. Holding exclusive lock_sys.latch (or before 10.6, lock_sys.mutex) will block lock_release() during trx_t::release_locks() but not the transaction state change.
When it comes to race conditions with innobase_close_connection(), there is no way to synchronize with trx_disconnect_prepared(). The innobase_rollback_trx() would be blocked by lock_sys.latch and trx_t::mutex.
Vladislav Lesin
added a comment - Useful comment from marko :
A possible fix might be to acquire THD::LOCK_thd_data in a safe way, and to ensure that a disconnect would be blocked it. Currently it is not the case; see MDEV-29368 .
The function fill_trx_row() does check if trx_t::mysql_thd is a null pointer, but race conditions after that point are possible. Holding exclusive lock_sys.latch (or before 10.6, lock_sys.mutex ) will block lock_release() during trx_t::release_locks() but not the transaction state change.
When it comes to race conditions with innobase_close_connection() , there is no way to synchronize with trx_disconnect_prepared() . The innobase_rollback_trx() would be blocked by lock_sys.latch and trx_t::mutex .
fill_trx_row() is invoked from fetch_data_into_cache(), which, in turns, iterates transactions with trx_sys.trx_list.for_each() function, which holds trx_sys.trx_list.mutex during the iteration.
At the other hand, innobase_close_connection() invokes trx->free(), which removes transaction from transactions container trx_sys.deregister_trx(this), which acquires trx_sys.trx_list.mutex too. And after this trx_t::free() zeroes out trx->mysql_thd. That's why we don't catch the bug with usual disconnection.
trx_disconnect_prepared() is invoked only from innobase_close_connection(). I think trx_disconnect_prepared() must zero out trx->mysql_thd under trx_sys.trx_list.mutex(), copying the above logic of trx_t::free(). thread_safe_trx_ilist_t::freeze()/unfreeze() functions could be used for that.
The same is true for 10.4. See fetch_data_into_cache() for details.
Vladislav Lesin
added a comment - - edited 10.6 code analyses:
fill_trx_row() is invoked from fetch_data_into_cache(), which, in turns, iterates transactions with trx_sys.trx_list.for_each() function, which holds trx_sys.trx_list.mutex during the iteration.
At the other hand, innobase_close_connection() invokes trx->free(), which removes transaction from transactions container trx_sys.deregister_trx(this), which acquires trx_sys.trx_list.mutex too. And after this trx_t::free() zeroes out trx->mysql_thd. That's why we don't catch the bug with usual disconnection.
trx_disconnect_prepared() is invoked only from innobase_close_connection(). I think trx_disconnect_prepared() must zero out trx->mysql_thd under trx_sys.trx_list.mutex(), copying the above logic of trx_t::free(). thread_safe_trx_ilist_t::freeze()/unfreeze() functions could be used for that.
The same is true for 10.4. See fetch_data_into_cache() for details.
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
exists only for 10.4. And trx_disconnect_prepared() is just not invoked with the above test in 10.3. But we will also fix 10.3 to be sure, it will not be broken, if somebody decide to backport the above commit to 10.3.
Vladislav Lesin
added a comment - The bug does not affect 10.3, because the following commit:
commit 0993d6b81b6cf7a5fc0710d99e962a8271018b9d
Author: Sergey Vojtovich <svoj@mariadb.org>
Date: Fri Mar 30 00:33:58 2018 +0400
MDEV-15773 - trx_sys.mysql_trx_list -> trx_sys.trx_list
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
exists only for 10.4. And trx_disconnect_prepared() is just not invoked with the above test in 10.3. But we will also fix 10.3 to be sure, it will not be broken, if somebody decide to backport the above commit to 10.3.
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
Because before the fix trx_disconnect_from_mysql() contained the critical section. But after the fix, that function was replaced with trx_disconnect_prepared(), which does not contain the critical section.
Vladislav Lesin
added a comment - The bug is caused by the following commit:
commit 0993d6b81b6cf7a5fc0710d99e962a8271018b9d
Author: Sergey Vojtovich <svoj@mariadb.org>
Date: Fri Mar 30 00:33:58 2018 +0400
MDEV-15773 - trx_sys.mysql_trx_list -> trx_sys.trx_list
Replaced "list of transactions created for MySQL" with "list of all
transactions". This simplifies code and allows further removal of
trx_sys.m_views.
Because before the fix trx_disconnect_from_mysql() contained the critical section. But after the fix, that function was replaced with trx_disconnect_prepared(), which does not contain the critical section.
People
Vladislav Lesin
Vladislav Lesin
Votes:
1Vote for this issue
Watchers:
4Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":906.5,"ttfb":269.80000019073486,"pageVisibility":"visible","entityId":114800,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"4f45a434-303a-49c2-a76f-2b21dcc4488e","navigationType":0,"readyForUser":1010.9000000953674,"redirectCount":0,"resourceLoadedEnd":626.6000003814697,"resourceLoadedStart":288.30000019073486,"resourceTiming":[{"duration":24.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":288.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":288.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":313.2000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":25.399999618530273,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":288.6000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":288.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":314,"responseStart":0,"secureConnectionStart":0},{"duration":207.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":288.7000002861023,"connectEnd":288.7000002861023,"connectStart":288.7000002861023,"domainLookupEnd":288.7000002861023,"domainLookupStart":288.7000002861023,"fetchStart":288.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":316.80000019073486,"responseEnd":495.80000019073486,"responseStart":350.7000002861023,"secureConnectionStart":288.7000002861023},{"duration":337.7000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":288.90000009536743,"connectEnd":288.90000009536743,"connectStart":288.90000009536743,"domainLookupEnd":288.90000009536743,"domainLookupStart":288.90000009536743,"fetchStart":288.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":317.7000002861023,"responseEnd":626.6000003814697,"responseStart":356.90000009536743,"secureConnectionStart":288.90000009536743},{"duration":69.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":289.1000003814697,"connectEnd":289.1000003814697,"connectStart":289.1000003814697,"domainLookupEnd":289.1000003814697,"domainLookupStart":289.1000003814697,"fetchStart":289.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":318.5,"responseEnd":358.6000003814697,"responseStart":354.40000009536743,"secureConnectionStart":289.1000003814697},{"duration":75.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":289.30000019073486,"connectEnd":289.30000019073486,"connectStart":289.30000019073486,"domainLookupEnd":289.30000019073486,"domainLookupStart":289.30000019073486,"fetchStart":289.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":325.1000003814697,"responseEnd":364.80000019073486,"responseStart":362,"secureConnectionStart":289.30000019073486},{"duration":80.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":289.40000009536743,"connectEnd":289.40000009536743,"connectStart":289.40000009536743,"domainLookupEnd":289.40000009536743,"domainLookupStart":289.40000009536743,"fetchStart":289.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":329.6000003814697,"responseEnd":369.5,"responseStart":365.80000019073486,"secureConnectionStart":289.40000009536743},{"duration":37.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":289.7000002861023,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":289.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":326.80000019073486,"responseStart":0,"secureConnectionStart":0},{"duration":79.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":289.90000009536743,"connectEnd":289.90000009536743,"connectStart":289.90000009536743,"domainLookupEnd":289.90000009536743,"domainLookupStart":289.90000009536743,"fetchStart":289.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":330.1000003814697,"responseEnd":369.80000019073486,"responseStart":367.7000002861023,"secureConnectionStart":289.90000009536743},{"duration":39,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":290,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":290,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":329,"responseStart":0,"secureConnectionStart":0},{"duration":82.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":290.2000002861023,"connectEnd":290.2000002861023,"connectStart":290.2000002861023,"domainLookupEnd":290.2000002861023,"domainLookupStart":290.2000002861023,"fetchStart":290.2000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":332.40000009536743,"responseEnd":373,"responseStart":370.1000003814697,"secureConnectionStart":290.2000002861023},{"duration":287.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":291.2000002861023,"connectEnd":291.2000002861023,"connectStart":291.2000002861023,"domainLookupEnd":291.2000002861023,"domainLookupStart":291.2000002861023,"fetchStart":291.2000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":356.1000003814697,"responseEnd":579,"responseStart":573.1000003814697,"secureConnectionStart":291.2000002861023},{"duration":288.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":291.30000019073486,"connectEnd":291.30000019073486,"connectStart":291.30000019073486,"domainLookupEnd":291.30000019073486,"domainLookupStart":291.30000019073486,"fetchStart":291.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":364.5,"responseEnd":579.8000001907349,"responseStart":575.1000003814697,"secureConnectionStart":291.30000019073486},{"duration":100.69999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":654.3000001907349,"connectEnd":654.3000001907349,"connectStart":654.3000001907349,"domainLookupEnd":654.3000001907349,"domainLookupStart":654.3000001907349,"fetchStart":654.3000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":721.8000001907349,"responseEnd":755,"responseStart":754.3000001907349,"secureConnectionStart":654.3000001907349},{"duration":224.80000019073486,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":899,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":899,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1123.8000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":238.19999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":905.6000003814697,"connectEnd":905.6000003814697,"connectStart":905.6000003814697,"domainLookupEnd":905.6000003814697,"domainLookupStart":905.6000003814697,"fetchStart":905.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":1110.9000000953674,"responseEnd":1143.8000001907349,"responseStart":1143.1000003814697,"secureConnectionStart":905.6000003814697}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":87,"responseStart":270,"responseEnd":274,"domLoading":285,"domInteractive":1131,"domContentLoadedEventStart":1131,"domContentLoadedEventEnd":1196,"domComplete":1476,"loadEventStart":1476,"loadEventEnd":1476,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1101},{"name":"bigPipe.sidebar-id.end","time":1101.8000001907349},{"name":"bigPipe.activity-panel-pipe-id.start","time":1101.9000000953674},{"name":"bigPipe.activity-panel-pipe-id.end","time":1104.1000003814697},{"name":"activityTabFullyLoaded","time":1215.3000001907349}],"measures":[],"correlationId":"ed6ec8a7bc34ac","effectiveType":"4g","downlink":9.8,"rtt":0,"serverDuration":107,"dbReadsTimeInMs":14,"dbConnsTimeInMs":23,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Useful comment from marko:
A possible fix might be to acquire THD::LOCK_thd_data in a safe way, and to ensure that a disconnect would be blocked it. Currently it is not the case; see
MDEV-29368.The function fill_trx_row() does check if trx_t::mysql_thd is a null pointer, but race conditions after that point are possible. Holding exclusive lock_sys.latch (or before 10.6, lock_sys.mutex) will block lock_release() during trx_t::release_locks() but not the transaction state change.
When it comes to race conditions with innobase_close_connection(), there is no way to synchronize with trx_disconnect_prepared(). The innobase_rollback_trx() would be blocked by lock_sys.latch and trx_t::mutex.