Am running the following on a slave:
- Largish (24h, 600M rows, 200G) ALTER TABLE
- Events with INFORMATION_SCHEMA queries
- Threadpool pool-of-threads active
- Replication active
- No other significant traffic
After several hours, MariaDB locks up with 0% CPU and disk activity, and no response on existing or new connections on port, extra_port, or socket.
Attached are gdb backtraces for two occurrences, examples of the ALTER and the INFORMATION_SCHEMA activity, and other info. Would appreciate any insight from devs to identify the deadlock, and to narrow down the variables for a test case that isn't 200G.
Am presently trialing the ALTER outside the threadpool using the extra_port, with all other settings unchanged.
Other notes:
- It doesn't seem to be a thread pool overload, as there aren't enough threads in the backtrace.
- The INFORMATION_SCHEMA event traffic uses GET_LOCK to serialize some activity and prevent pile-up.
{"report":{"fcp":1043.6999998092651,"ttfb":343.3999996185303,"pageVisibility":"visible","entityId":44903,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"845fd02d-2115-41e2-9002-1ecb272e3ab0","navigationType":0,"readyForUser":1162.5,"redirectCount":0,"resourceLoadedEnd":773.6000003814697,"resourceLoadedStart":352.19999980926514,"resourceTiming":[{"duration":92.10000038146973,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":352.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":352.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":444.30000019073486,"responseStart":0,"secureConnectionStart":0},{"duration":93.10000038146973,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":352.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":352.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":445.6000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":276.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":352.69999980926514,"connectEnd":352.69999980926514,"connectStart":352.69999980926514,"domainLookupEnd":352.69999980926514,"domainLookupStart":352.69999980926514,"fetchStart":352.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":459.3999996185303,"responseEnd":629.5,"responseStart":473.30000019073486,"secureConnectionStart":352.69999980926514},{"duration":399.5999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/2bf333562ca6724060a9d5f1535471f6/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":353.30000019073486,"connectEnd":353.30000019073486,"connectStart":353.30000019073486,"domainLookupEnd":353.30000019073486,"domainLookupStart":353.30000019073486,"fetchStart":353.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":459.6000003814697,"responseEnd":752.8999996185303,"responseStart":483.19999980926514,"secureConnectionStart":353.30000019073486},{"duration":124.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":353.5,"connectEnd":353.5,"connectStart":353.5,"domainLookupEnd":353.5,"domainLookupStart":353.5,"fetchStart":353.5,"redirectEnd":0,"redirectStart":0,"requestStart":459.80000019073486,"responseEnd":478.19999980926514,"responseStart":475.1000003814697,"secureConnectionStart":353.5},{"duration":128.29999923706055,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":353.6000003814697,"connectEnd":353.6000003814697,"connectStart":353.6000003814697,"domainLookupEnd":353.6000003814697,"domainLookupStart":353.6000003814697,"fetchStart":353.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":461.1000003814697,"responseEnd":481.8999996185303,"responseStart":475.80000019073486,"secureConnectionStart":353.6000003814697},{"duration":138.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":353.80000019073486,"connectEnd":353.80000019073486,"connectStart":353.80000019073486,"domainLookupEnd":353.80000019073486,"domainLookupStart":353.80000019073486,"fetchStart":353.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":463.80000019073486,"responseEnd":492.5,"responseStart":488.80000019073486,"secureConnectionStart":353.80000019073486},{"duration":107.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":354,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":354,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":461.5,"responseStart":0,"secureConnectionStart":0},{"duration":138.69999980926514,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":354.19999980926514,"connectEnd":354.19999980926514,"connectStart":354.19999980926514,"domainLookupEnd":354.19999980926514,"domainLookupStart":354.19999980926514,"fetchStart":354.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":465.1000003814697,"responseEnd":492.8999996185303,"responseStart":489.69999980926514,"secureConnectionStart":354.19999980926514},{"duration":109.80000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":354.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":354.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":464.1000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":144.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":354.5,"connectEnd":354.5,"connectStart":354.5,"domainLookupEnd":354.5,"domainLookupStart":354.5,"fetchStart":354.5,"redirectEnd":0,"redirectStart":0,"requestStart":468.30000019073486,"responseEnd":499.30000019073486,"responseStart":490.69999980926514,"secureConnectionStart":354.5},{"duration":409,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":363.3999996185303,"connectEnd":363.3999996185303,"connectStart":363.3999996185303,"domainLookupEnd":363.3999996185303,"domainLookupStart":363.3999996185303,"fetchStart":363.3999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":592.1000003814697,"responseEnd":772.3999996185303,"responseStart":761,"secureConnectionStart":363.3999996185303},{"duration":410,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":363.6000003814697,"connectEnd":363.6000003814697,"connectStart":363.6000003814697,"domainLookupEnd":363.6000003814697,"domainLookupStart":363.6000003814697,"fetchStart":363.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":648,"responseEnd":773.6000003814697,"responseStart":764.3999996185303,"secureConnectionStart":363.6000003814697},{"duration":155.69999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":785.6999998092651,"connectEnd":785.6999998092651,"connectStart":785.6999998092651,"domainLookupEnd":785.6999998092651,"domainLookupStart":785.6999998092651,"fetchStart":785.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":908.1000003814697,"responseEnd":941.3999996185303,"responseStart":940.6000003814697,"secureConnectionStart":785.6999998092651}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":178,"responseStart":343,"responseEnd":363,"domLoading":347,"domInteractive":1250,"domContentLoadedEventStart":1251,"domContentLoadedEventEnd":1300,"domComplete":2210,"loadEventStart":2210,"loadEventEnd":2210,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1215.3999996185303},{"name":"bigPipe.sidebar-id.end","time":1216.1999998092651},{"name":"bigPipe.activity-panel-pipe-id.start","time":1217.3000001907349},{"name":"bigPipe.activity-panel-pipe-id.end","time":1219},{"name":"activityTabFullyLoaded","time":1323.6999998092651}],"measures":[],"correlationId":"bc89805c30a2cf","effectiveType":"4g","downlink":9.8,"rtt":0,"serverDuration":95,"dbReadsTimeInMs":11,"dbConnsTimeInMs":19,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Tried disabling thread pool, replication, other traffic, all to no effect. The problem recurred. Each time after restart I waited until transactions had completed rollback.
Observing the threads stuck in buf_, mtr_, and log_ calls I tried reverting to a single buffer pool instance, innodb_buffer_pool_instances=1 (which matches our 5.5 config). That allowed the ALTER to complete normally without a hiccup.
A flushing lock-cycle of some sort?