Type:
Bug
Priority:
Critical
Resolution:
Unresolved
Affects Version/s:
5.5(EOL) , (16) 10.1(EOL) , 10.2(EOL) , 10.3(EOL) , 10.4(EOL) , 10.5 , 10.6 , 10.7(EOL) , 10.8(EOL) , 10.9(EOL) , 10.10(EOL) , 10.11 , 11.0(EOL) , 11.1(EOL) , 11.2(EOL) , 11.3(EOL) , 11.4
Recently, a customer experienced a crash of MariaDB Server 10.6.14 due to innodb_fatal_semaphore_wait_threshold being exceeded for dict_sys.latch . The latch was being exclusively held by a thread that was executing dict_stats_fetch_from_ps() in order to fetch persistent statistics while opening a table.
Among the stack traces of some thousands of threads, there was one thread executing fdatasync() inside buf_page_write_complete() . This appears to be the only write that was in progress at the time of the crash (write_slots->m_cache.m_pos == 1 ). This would seem to indicate that there was some performance problem with the storage subsystem that ultimately caused the innodb_fatal_semaphore_wait_threshold watchdog to kick in.
The invocation of log_free_check() inside the InnoDB internal SQL interpreter in que_run_threads_low() does not make sense for read-only operations, such as loading statistics. While the call cannot cause deadlocks, unnecessarily waiting for something while holding an exclusive latch is not good practice. The purpose of log_free_check() is to prevent an overrun of the circular ib_logfile0 , by ensuring that the log checkpoint will be advanced before writes can continue.
The reason why the wait cannot cause deadlocks is that by design, a thread is not allowed to invoke log_free_check() or wait for dict_sys.latch while it is holding a buffer page latch or any other latch (see MDEV-32049 ) that other page-latch holding threads may be waiting for. The buf_flush_page_cleaner() thread itself never waits for dict_sys.latch . The progress of log checkpoints could be blocked if any thread is holding a write latch on a block that would need to be written to a data page in order to advance the checkpoint.
I checked all code paths where dict_sys.lock(SRW_LOCK_CALL) is being invoked in MariaDB Server 10.6. The following code paths may invoke log_free_check() :
dict_stats_save() (saving persistent statistics; before 10.6, this was the main trigger of the infamous MDEV-15020 deadlocks)
dict_stats_fetch_from_ps() (loading persistent statistics)
fts_load_user_stopword() (loading stopwords for FULLTEXT INDEX )
(fts_parse_sql() is invoking pars_sql() but not executing it)
innodb_drop_database() (executing DROP DATABASE ) is accessing FOREIGN KEY constraints; this is subject to change in MDEV-16417
many DDL operations are invoking the InnoDB internal SQL parser to update the InnoDB data dictionary tables while holding exclusive dict_sys.latch
A notable subsystem that is clean in this respect is the loading of table definitions from the InnoDB data dictionary tables into the dict_sys cache.
We might try to alleviate this problem by adding a log_free_check() invocation to each of the functions listed above, right before the dict_sys.lock acquisition. But, I do not think that it would bring much on a heavily loaded server; concurrently other threads would be able to write more log while we were waiting, and the log_free_check() invocation inside que_run_threads_low() could end up waiting again, and that time blocking the progress of many other threads.
A sustainable fix would seem to be to refactor all use of the InnoDB internal SQL parser and remove that code, so that we have a better overview where log_free_check() needs to be invoked and dict_sys.latch really needs to be held. For DDL operations this is going to be tricky. For persistent statistics and operations related to FULLTEXT INDEX , this is more doable.
We could also check if the operations on persistent statistics can be split so that only pars_sql() is invoked while holding dict_sys.latch , similar to how fts_parse_sql() appears to work.
{"report":{"fcp":736.2999999821186,"ttfb":180.7999999821186,"pageVisibility":"visible","entityId":128218,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":1,"journeyId":"3aae0d4b-520d-4ab0-b8dc-11bca5f8ead0","navigationType":0,"readyForUser":820.3999999761581,"redirectCount":0,"resourceLoadedEnd":848,"resourceLoadedStart":186,"resourceTiming":[{"duration":42.89999997615814,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":228.89999997615814,"responseStart":0,"secureConnectionStart":0},{"duration":43.20000001788139,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":186.2999999821186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":186.2999999821186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":229.5,"responseStart":0,"secureConnectionStart":0},{"duration":117.40000000596046,"initiatorType":"script","name":"https://jira.mariadb.org/s/e9b27a47da5fb0f74a35acd57e9847fb-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":186.39999997615814,"connectEnd":186.39999997615814,"connectStart":186.39999997615814,"domainLookupEnd":186.39999997615814,"domainLookupStart":186.39999997615814,"fetchStart":186.39999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":186.39999997615814,"responseEnd":303.7999999821186,"responseStart":303.7999999821186,"secureConnectionStart":186.39999997615814},{"duration":141.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/c32eb0da7ad9831253f8397e6cc26afd-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":186.59999999403954,"connectEnd":186.59999999403954,"connectStart":186.59999999403954,"domainLookupEnd":186.59999999403954,"domainLookupStart":186.59999999403954,"fetchStart":186.59999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":186.59999999403954,"responseEnd":328.09999999403954,"responseStart":328.09999999403954,"secureConnectionStart":186.59999999403954},{"duration":144.7000000178814,"initiatorType":"script","name":"https://jira.mariadb.org/s/bc0bcb146314416123c992714ee00ff7-CDN/lu2bv2/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":186.89999997615814,"connectEnd":186.89999997615814,"connectStart":186.89999997615814,"domainLookupEnd":186.89999997615814,"domainLookupStart":186.89999997615814,"fetchStart":186.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":186.89999997615814,"responseEnd":331.59999999403954,"responseStart":331.59999999403954,"secureConnectionStart":186.89999997615814},{"duration":145.09999999403954,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":187,"connectEnd":187,"connectStart":187,"domainLookupEnd":187,"domainLookupStart":187,"fetchStart":187,"redirectEnd":0,"redirectStart":0,"requestStart":187,"responseEnd":332.09999999403954,"responseStart":332.09999999403954,"secureConnectionStart":187},{"duration":145.30000001192093,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":187.19999998807907,"connectEnd":187.19999998807907,"connectStart":187.19999998807907,"domainLookupEnd":187.19999998807907,"domainLookupStart":187.19999998807907,"fetchStart":187.19999998807907,"redirectEnd":0,"redirectStart":0,"requestStart":187.19999998807907,"responseEnd":332.5,"responseStart":332.5,"secureConnectionStart":187.19999998807907},{"duration":146,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bv2/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":187.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":187.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":333.5,"responseStart":0,"secureConnectionStart":0},{"duration":145.2999999821186,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":187.59999999403954,"connectEnd":187.59999999403954,"connectStart":187.59999999403954,"domainLookupEnd":187.59999999403954,"domainLookupStart":187.59999999403954,"fetchStart":187.59999999403954,"redirectEnd":0,"redirectStart":0,"requestStart":187.59999999403954,"responseEnd":332.89999997615814,"responseStart":332.89999997615814,"secureConnectionStart":187.59999999403954},{"duration":145.7000000178814,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":187.7999999821186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":187.7999999821186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":333.5,"responseStart":0,"secureConnectionStart":0},{"duration":145.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/719848dd97ebe0663199f49a3936487a-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":187.89999997615814,"connectEnd":187.89999997615814,"connectStart":187.89999997615814,"domainLookupEnd":187.89999997615814,"domainLookupStart":187.89999997615814,"fetchStart":187.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":187.89999997615814,"responseEnd":333.5,"responseStart":333.5,"secureConnectionStart":187.89999997615814},{"duration":144.09999999403954,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":193.19999998807907,"connectEnd":193.19999998807907,"connectStart":193.19999998807907,"domainLookupEnd":193.19999998807907,"domainLookupStart":193.19999998807907,"fetchStart":193.19999998807907,"redirectEnd":0,"redirectStart":0,"requestStart":193.19999998807907,"responseEnd":337.2999999821186,"responseStart":337.2999999821186,"secureConnectionStart":193.19999998807907},{"duration":257.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":193.19999998807907,"connectEnd":193.19999998807907,"connectStart":193.19999998807907,"domainLookupEnd":193.19999998807907,"domainLookupStart":193.19999998807907,"fetchStart":193.19999998807907,"redirectEnd":0,"redirectStart":0,"requestStart":193.19999998807907,"responseEnd":450.69999998807907,"responseStart":450.69999998807907,"secureConnectionStart":193.19999998807907},{"duration":127,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":527.5,"connectEnd":527.5,"connectStart":527.5,"domainLookupEnd":527.5,"domainLookupStart":527.5,"fetchStart":527.5,"redirectEnd":0,"redirectStart":0,"requestStart":527.5,"responseEnd":654.5,"responseStart":654.5,"secureConnectionStart":527.5},{"duration":124.5,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":730.3999999761581,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":730.3999999761581,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":854.8999999761581,"responseStart":0,"secureConnectionStart":0},{"duration":109.10000002384186,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2bv2/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":738.8999999761581,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":738.8999999761581,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":848,"responseStart":0,"secureConnectionStart":0},{"duration":70.19999998807907,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":739.5999999940395,"connectEnd":739.5999999940395,"connectStart":739.5999999940395,"domainLookupEnd":739.5999999940395,"domainLookupStart":739.5999999940395,"fetchStart":739.5999999940395,"redirectEnd":0,"redirectStart":0,"requestStart":739.5999999940395,"responseEnd":809.7999999821186,"responseStart":809.7999999821186,"secureConnectionStart":739.5999999940395},{"duration":74.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/53a43b6764f587426c7bb9a150184c00-CDN/lu2bv2/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/js/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":740,"connectEnd":740,"connectStart":740,"domainLookupEnd":740,"domainLookupStart":740,"fetchStart":740,"redirectEnd":0,"redirectStart":0,"requestStart":740,"responseEnd":814.5,"responseStart":814.5,"secureConnectionStart":740}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":3,"responseStart":181,"responseEnd":185,"domLoading":184,"domInteractive":891,"domContentLoadedEventStart":891,"domContentLoadedEventEnd":927,"domComplete":1162,"loadEventStart":1162,"loadEventEnd":1164,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":858},{"name":"bigPipe.sidebar-id.end","time":858.6999999880791},{"name":"bigPipe.activity-panel-pipe-id.start","time":858.8999999761581},{"name":"bigPipe.activity-panel-pipe-id.end","time":861.1999999880791},{"name":"activityTabFullyLoaded","time":937.2999999821186}],"measures":[],"correlationId":"b822c27f5c39d9","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":115,"dbReadsTimeInMs":20,"dbConnsTimeInMs":30,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}