Type:
Bug
Priority:
Major
Resolution:
Fixed
Affects Version/s:
10.9(EOL) , 10.10(EOL) , 10.11 , 11.0(EOL) , 11.1(EOL) , 11.2(EOL) , 11.3(EOL) , 11.4 , 11.5(EOL) , 11.6(EOL) , 11.7(EOL)
While testing MDEV-33853 , mleich produced a core dump of a server hang, which resulted in an intentional crash with the infamous message that innodb_fatal_semaphore_wait for dict_sys.latch was exceeded.
In the stack traces that I analyzed, there were several threads trying to execute SET GLOBAL innodb_log_file_size as well as one DDL operation that was waiting on an exclusive log_sys.latch and therefore in the end blocking all threads, while holding exclusive dict_sys.latch .
I think that the root cause is flawed logic in log_resize_acquire() . It could also make sense to protect that function with LOCK_global_system_variables so that multiple instances of log_resize_acquire() can be executed concurrently. A fix might look as follows. This will need some additional stress testing to validate this.
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index ce2d3958f9c..27adf670b11 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -18518,9 +18518,7 @@ buffer_pool_load_abort(
static void innodb_log_file_buffering_update(THD *thd, st_mysql_sys_var*,
void *, const void *save)
{
- mysql_mutex_unlock(&LOCK_global_system_variables);
log_sys.set_buffered(*static_cast<const my_bool*>(save));
- mysql_mutex_lock(&LOCK_global_system_variables);
}
#endif
@@ -18528,7 +18526,6 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
void *var, const void *save)
{
ut_ad(var == &srv_log_file_size);
- mysql_mutex_unlock(&LOCK_global_system_variables);
if (high_level_read_only)
ib_senderrf(thd, IB_LOG_LEVEL_ERROR, ER_READ_ONLY_MODE);
@@ -18551,13 +18548,15 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
ib_senderrf(thd, IB_LOG_LEVEL_ERROR, ER_CANT_CREATE_HANDLER_FILE);
break;
case log_t::RESIZE_STARTED:
+ mysql_mutex_unlock(&LOCK_global_system_variables);
const lsn_t start{log_sys.resize_in_progress()};
for (timespec abstime;;)
{
if (thd_kill_level(thd))
{
+ mysql_mutex_lock(&LOCK_global_system_variables);
log_sys.resize_abort();
- break;
+ return;
}
set_timespec(abstime, 5);
@@ -18588,9 +18587,9 @@ static void innodb_log_file_size_update(THD *thd, st_mysql_sys_var*,
if (!resizing || resizing > start /* only wait for our resize */)
break;
}
+ mysql_mutex_lock(&LOCK_global_system_variables);
}
}
- mysql_mutex_lock(&LOCK_global_system_variables);
}
static void innodb_log_spin_wait_delay_update(THD *, st_mysql_sys_var*,
diff --git a/storage/innobase/log/log0log.cc b/storage/innobase/log/log0log.cc
index d7aae556ce0..7c94876996c 100644
--- a/storage/innobase/log/log0log.cc
+++ b/storage/innobase/log/log0log.cc
@@ -384,6 +384,8 @@ void log_t::close_file()
/** Acquire all latches that protect the log. */
static void log_resize_acquire()
{
+ mysql_mutex_assert_owner(&LOCK_global_system_variables);
+
if (!log_sys.is_pmem())
{
while (flush_lock.acquire(log_sys.get_lsn() + 1, nullptr) !=
I am also wondering if the logic regarding flush_lock.acquire() and write_lock.acquire() , which are partly visible above, is correct.
wlad , I wonder if there is any chance that another thread in log_write_up_to() could get "in between" and start to wait for log_sys.latch before log_resize_acquire() completes log_sys.latch.wr_lock() ? If yes, is there any way to acquire the write_lock and flush_lock with an even larger LSN (such as LSN_MAX ) and rewind to the actual LSN once done? I have a concern that even with the above patch, a hang could be possible.
{"report":{"fcp":2440.5999999046326,"ttfb":698.5,"pageVisibility":"visible","entityId":130712,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"8ecaa6d1-d9a3-46f5-8e5a-9b9fe01a965d","navigationType":0,"readyForUser":2525.899999856949,"redirectCount":0,"resourceLoadedEnd":3460.899999856949,"resourceLoadedStart":704.5,"resourceTiming":[{"duration":1269.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":704.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":704.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1974,"responseStart":0,"secureConnectionStart":0},{"duration":1269.3000001907349,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":704.6999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":704.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1974,"responseStart":0,"secureConnectionStart":0},{"duration":1278,"initiatorType":"script","name":"https://jira.mariadb.org/s/fbf975c0cce4b1abf04784eeae9ba1f4-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":704.8999998569489,"connectEnd":704.8999998569489,"connectStart":704.8999998569489,"domainLookupEnd":704.8999998569489,"domainLookupStart":704.8999998569489,"fetchStart":704.8999998569489,"redirectEnd":0,"redirectStart":0,"requestStart":704.8999998569489,"responseEnd":1982.8999998569489,"responseStart":1982.8999998569489,"secureConnectionStart":704.8999998569489},{"duration":1432.2999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/94c15bff32baef80f4096a08aceae8bc-CDN/lu2bu7/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":705,"connectEnd":705,"connectStart":705,"domainLookupEnd":705,"domainLookupStart":705,"fetchStart":705,"redirectEnd":0,"redirectStart":0,"requestStart":705,"responseEnd":2137.2999999523163,"responseStart":2137.2999999523163,"secureConnectionStart":705},{"duration":1428.3999998569489,"initiatorType":"script","name":"https://jira.mariadb.org/s/099b33461394b8015fc36c0a4b96e19f-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":705,"connectEnd":705,"connectStart":705,"domainLookupEnd":705,"domainLookupStart":705,"fetchStart":705,"redirectEnd":0,"redirectStart":0,"requestStart":705,"responseEnd":2133.399999856949,"responseStart":2133.399999856949,"secureConnectionStart":705},{"duration":1432.6000001430511,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":705.1999998092651,"connectEnd":705.1999998092651,"connectStart":705.1999998092651,"domainLookupEnd":705.1999998092651,"domainLookupStart":705.1999998092651,"fetchStart":705.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":705.1999998092651,"responseEnd":2137.7999999523163,"responseStart":2137.7999999523163,"secureConnectionStart":705.1999998092651},{"duration":1432.7999999523163,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":705.2999999523163,"connectEnd":705.2999999523163,"connectStart":705.2999999523163,"domainLookupEnd":705.2999999523163,"domainLookupStart":705.2999999523163,"fetchStart":705.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":705.2999999523163,"responseEnd":2138.0999999046326,"responseStart":2138.0999999046326,"secureConnectionStart":705.2999999523163},{"duration":1434.2000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bu7/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":705.3999998569489,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":705.3999998569489,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":2139.5999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":1434.0999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":705.5,"connectEnd":705.5,"connectStart":705.5,"domainLookupEnd":705.5,"domainLookupStart":705.5,"fetchStart":705.5,"redirectEnd":0,"redirectStart":0,"requestStart":705.5,"responseEnd":2139.5999999046326,"responseStart":2139.5999999046326,"secureConnectionStart":705.5},{"duration":1435.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":705.5999999046326,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":705.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":2141.0999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":1435.6000001430511,"initiatorType":"script","name":"https://jira.mariadb.org/s/3339d87fa2538a859872f2df449bf8d0-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":705.6999998092651,"connectEnd":705.6999998092651,"connectStart":705.6999998092651,"domainLookupEnd":705.6999998092651,"domainLookupStart":705.6999998092651,"fetchStart":705.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":705.6999998092651,"responseEnd":2141.2999999523163,"responseStart":2141.2999999523163,"secureConnectionStart":705.6999998092651},{"duration":2456,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":711.7999999523163,"connectEnd":711.7999999523163,"connectStart":711.7999999523163,"domainLookupEnd":711.7999999523163,"domainLookupStart":711.7999999523163,"fetchStart":711.7999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":711.7999999523163,"responseEnd":3167.7999999523163,"responseStart":3167.7999999523163,"secureConnectionStart":711.7999999523163},{"duration":2749,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":711.8999998569489,"connectEnd":711.8999998569489,"connectStart":711.8999998569489,"domainLookupEnd":711.8999998569489,"domainLookupStart":711.8999998569489,"fetchStart":711.8999998569489,"redirectEnd":0,"redirectStart":0,"requestStart":711.8999998569489,"responseEnd":3460.899999856949,"responseStart":3460.899999856949,"secureConnectionStart":711.8999998569489},{"duration":1036.0999999046326,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":2131.0999999046326,"connectEnd":2131.0999999046326,"connectStart":2131.0999999046326,"domainLookupEnd":2131.0999999046326,"domainLookupStart":2131.0999999046326,"fetchStart":2131.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":2131.0999999046326,"responseEnd":3167.199999809265,"responseStart":3167.199999809265,"secureConnectionStart":2131.0999999046326}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":383,"responseStart":698,"responseEnd":707,"domLoading":702,"domInteractive":3483,"domContentLoadedEventStart":3483,"domContentLoadedEventEnd":3526,"domComplete":5751,"loadEventStart":5751,"loadEventEnd":5752,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":3466.399999856949},{"name":"bigPipe.sidebar-id.end","time":3467.399999856949},{"name":"bigPipe.activity-panel-pipe-id.start","time":3467.5},{"name":"bigPipe.activity-panel-pipe-id.end","time":3469.7999999523163},{"name":"activityTabFullyLoaded","time":3533.0999999046326}],"measures":[],"correlationId":"ea56d054794fd0","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":73,"dbReadsTimeInMs":12,"dbConnsTimeInMs":19,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}