The expected outcome would be that all undo tablespaces have been truncated to their default soft limit size (innodb_max_undo_log_size=10M). Instead of that, we will observe that one of the undo tablespace files is larger.
I think that the undo tablespace truncation needs to work also while InnoDB is running (mostly idle, with some writes every now and then) and the parameter innodb_purge_rseg_truncate_frequency caused a call to trx_purge_truncate_history() to be skipped during the last purge batch that made the undo logs logically empty but failed to reclaim the space.
I originally noticed this when testing an upgrade from a server that is affected by MDEV-31234.
Attachments
Issue Links
is blocked by
MDEV-31355innodb_undo_log_truncate=ON fails to wait for purge of enough transaction history
Closed
relates to
MDEV-29593Purge misses a chance to free not-yet-reused undo pages
Closed
MDEV-31234InnoDB does not free UNDO after the fix of MDEV-30671, thus shared tablespace (ibdata1) may grow indefinitely for no good reason
In 10.5, if I run the test with ./mtr --rr, the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown().
It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.
Marko Mäkelä
added a comment - In 10.5, if I run the test with ./mtr --rr , the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown() .
It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.
So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in MDEV-30671:
This condition must be revised in MDEV-31355 anyway.
Marko Mäkelä
added a comment - So far, I got the undo log truncation during slow shutdown to work for my test case. While working on it, I had to revise an unnecessarily strict condition that had originally been added in MDEV-30671 :
@@ -643,7 +644,8 @@ TRANSACTIONAL_TARGET static void trx_purge_truncate_history()
rseg.latch.rd_lock(SRW_LOCK_CALL);
ut_ad(rseg.skip_allocation());
- if (rseg.is_referenced() || rseg.needs_purge > head.trx_no)
+ if (rseg.is_referenced() ||
+ (rseg.needs_purge > head.trx_no && head.trx_no))
{
not_free:
rseg.latch.rd_unlock();
This condition must be revised in MDEV-31355 anyway.
A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size.
I tested my fix also outside shutdown:
--source include/have_innodb.inc
--source include/have_sequence.inc
SETGLOBAL innodb_undo_log_truncate=OFF;
CREATETABLE t(a INTPRIMARYKEY, b INTUNIQUE) ENGINE=InnoDB;
INSERTINTO t SELECT seq, NULLFROM seq_1_to_130000;
UPDATE t SET b=a;
DROPTABLE t;
SETGLOBAL innodb_undo_log_truncate=ON;
SETGLOBAL innodb_max_purge_lag_wait=0;
My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.
Marko Mäkelä
added a comment - A call to trx_purge_truncate_history() will attempt to truncate all undo tablespaces whose size exceeds the soft limit innodb_max_undo_log_size .
I tested my fix also outside shutdown:
--source include/have_innodb.inc
--source include/have_sequence.inc
SET GLOBAL innodb_undo_log_truncate= OFF ;
CREATE TABLE t(a INT PRIMARY KEY , b INT UNIQUE ) ENGINE=InnoDB;
INSERT INTO t SELECT seq, NULL FROM seq_1_to_130000;
UPDATE t SET b=a;
DROP TABLE t;
SET GLOBAL innodb_undo_log_truncate= ON ;
SET GLOBAL innodb_max_purge_lag_wait=0;
My fix will cause SET GLOBAL innodb_undo_log_truncate=ON to wake up the purge coordinator in case it is not running.
Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF. axel tested that and found that it would slightly reduce throughput.
Marko Mäkelä
added a comment - Related to this, I was wondering if it would make sense to change the default value of the confusingly named parameter innodb_purge_rseg_truncate_frequency to 1 (for the maximum frequency), so that undo log pages would be freed more frequently even when using the default setting innodb_undo_log_truncate=OFF . axel tested that and found that it would slightly reduce throughput.
People
Marko Mäkelä
Marko Mäkelä
Votes:
1Vote for this issue
Watchers:
8Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1716.5,"ttfb":218.40000009536743,"pageVisibility":"visible","entityId":122417,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"a92d336c-6030-4273-b1cf-34c1e1be4b8c","navigationType":0,"readyForUser":1894.0999999046326,"redirectCount":0,"resourceLoadedEnd":2275.699999809265,"resourceLoadedStart":223.40000009536743,"resourceTiming":[{"duration":715.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":223.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":223.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":938.9000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":721.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":223.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":223.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":945.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":728.2000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":223.7999997138977,"connectEnd":223.7999997138977,"connectStart":223.7999997138977,"domainLookupEnd":223.7999997138977,"domainLookupStart":223.7999997138977,"fetchStart":223.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":223.7999997138977,"responseEnd":952,"responseStart":952,"secureConnectionStart":223.7999997138977},{"duration":828.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":224.09999990463257,"connectEnd":224.09999990463257,"connectStart":224.09999990463257,"domainLookupEnd":224.09999990463257,"domainLookupStart":224.09999990463257,"fetchStart":224.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":224.09999990463257,"responseEnd":1052.4000000953674,"responseStart":1052.4000000953674,"secureConnectionStart":224.09999990463257},{"duration":832.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":224.19999980926514,"connectEnd":224.19999980926514,"connectStart":224.19999980926514,"domainLookupEnd":224.19999980926514,"domainLookupStart":224.19999980926514,"fetchStart":224.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":224.19999980926514,"responseEnd":1056.5,"responseStart":1056.5,"secureConnectionStart":224.19999980926514},{"duration":866.1999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":224.40000009536743,"connectEnd":224.40000009536743,"connectStart":224.40000009536743,"domainLookupEnd":224.40000009536743,"domainLookupStart":224.40000009536743,"fetchStart":224.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":224.40000009536743,"responseEnd":1090.5999999046326,"responseStart":1090.5999999046326,"secureConnectionStart":224.40000009536743},{"duration":866.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":224.59999990463257,"connectEnd":224.59999990463257,"connectStart":224.59999990463257,"domainLookupEnd":224.59999990463257,"domainLookupStart":224.59999990463257,"fetchStart":224.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":224.59999990463257,"responseEnd":1091.0999999046326,"responseStart":1091.0999999046326,"secureConnectionStart":224.59999990463257},{"duration":992.2000002861023,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":224.7999997138977,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":224.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1217,"responseStart":0,"secureConnectionStart":0},{"duration":866.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":224.90000009536743,"connectEnd":224.90000009536743,"connectStart":224.90000009536743,"domainLookupEnd":224.90000009536743,"domainLookupStart":224.90000009536743,"fetchStart":224.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":224.90000009536743,"responseEnd":1091.5999999046326,"responseStart":1091.5999999046326,"secureConnectionStart":224.90000009536743},{"duration":992.0999999046326,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":225.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":225.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1217.2999997138977,"responseStart":0,"secureConnectionStart":0},{"duration":867,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":225.2999997138977,"connectEnd":225.2999997138977,"connectStart":225.2999997138977,"domainLookupEnd":225.2999997138977,"domainLookupStart":225.2999997138977,"fetchStart":225.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":225.2999997138977,"responseEnd":1092.2999997138977,"responseStart":1092.2999997138977,"secureConnectionStart":225.2999997138977},{"duration":1230.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":233.09999990463257,"connectEnd":233.09999990463257,"connectStart":233.09999990463257,"domainLookupEnd":233.09999990463257,"domainLookupStart":233.09999990463257,"fetchStart":233.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":233.09999990463257,"responseEnd":1463.9000000953674,"responseStart":1463.9000000953674,"secureConnectionStart":233.09999990463257},{"duration":2042.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":233.2999997138977,"connectEnd":233.2999997138977,"connectStart":233.2999997138977,"domainLookupEnd":233.2999997138977,"domainLookupStart":233.2999997138977,"fetchStart":233.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":233.2999997138977,"responseEnd":2275.699999809265,"responseStart":2275.699999809265,"secureConnectionStart":233.2999997138977},{"duration":267.5,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1267.1999998092651,"connectEnd":1267.1999998092651,"connectStart":1267.1999998092651,"domainLookupEnd":1267.1999998092651,"domainLookupStart":1267.1999998092651,"fetchStart":1267.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1267.1999998092651,"responseEnd":1534.6999998092651,"responseStart":1534.6999998092651,"secureConnectionStart":1267.1999998092651},{"duration":755.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1654.9000000953674,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1654.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":2410.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":748.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1655.9000000953674,"connectEnd":1655.9000000953674,"connectStart":1655.9000000953674,"domainLookupEnd":1655.9000000953674,"domainLookupStart":1655.9000000953674,"fetchStart":1655.9000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1655.9000000953674,"responseEnd":2404.4000000953674,"responseStart":2404.4000000953674,"secureConnectionStart":1655.9000000953674},{"duration":753,"initiatorType":"script","name":"https://jira.mariadb.org/s/097ae97cb8fbec7d6ea4bbb1f26955b9-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/js/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":1656.2999997138977,"connectEnd":1656.2999997138977,"connectStart":1656.2999997138977,"domainLookupEnd":1656.2999997138977,"domainLookupStart":1656.2999997138977,"fetchStart":1656.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":1656.2999997138977,"responseEnd":2409.2999997138977,"responseStart":2409.2999997138977,"secureConnectionStart":1656.2999997138977}],"fetchStart":1,"domainLookupStart":1,"domainLookupEnd":1,"connectStart":1,"connectEnd":1,"requestStart":47,"responseStart":219,"responseEnd":225,"domLoading":222,"domInteractive":2342,"domContentLoadedEventStart":2342,"domContentLoadedEventEnd":2410,"domComplete":3268,"loadEventStart":3268,"loadEventEnd":3268,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":2280.5999999046326},{"name":"bigPipe.sidebar-id.end","time":2281.5999999046326},{"name":"bigPipe.activity-panel-pipe-id.start","time":2281.699999809265},{"name":"bigPipe.activity-panel-pipe-id.end","time":2284.0999999046326},{"name":"activityTabFullyLoaded","time":2442.0999999046326}],"measures":[],"correlationId":"e19825c6bab4e8","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":112,"dbReadsTimeInMs":20,"dbConnsTimeInMs":29,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
In 10.5, if I run the test with ./mtr --rr, the second slow shutdown will be so slow that mtr kills the process. In 10.6, the shutdown completes. During the server run that ends in the second shutdown, purge_coordinator_callback() is not being invoked at all. The function trx_sys.history_size() will return 0 both times it was called, both in innodb_preshutdown().
It looks like the condition in srv_wake_purge_thread_if_not_active() needs to be revised so that it will trigger the purge even if no history exists but undo tablespace truncation is enabled and useful. Similarly, the purge coordinator task needs to invoke trx_purge_truncate_history() once after the history list got empty.