Each read-write transaction uses a rollback segment to store undo-records (needed for rollback).
Currently, InnoDB has 128 rollback segments (max) and they are shared by all the threads.
This means if a user is running 1024 threads workload, 8 threads will use the same rollback segment. Given each thread is running a transaction at a given point in time.
Put this on numa scale and the factor gets multiplied by numa scalability bottleneck that means the same rseg needs to be accessed by multiple threads located (possibly) across numa.
All this makes rseg-mutex one of the hottest mutex.
Testing carried out using sysbench-update-index with 1024 threads on a machine with 4 numa (2 sockets). 10.6 branch #80ac9ec1).
EVENT_NAME
WAIT_MS
COUNT_STAR
wait/synch/mutex/innodb/redo_rseg_mutex
102697560.5683
58758773
wait/synch/mutex/innodb/log_sys_mutex
49862366.5623
73277659
wait/synch/mutex/innodb/dict_sys_mutex
21348825.2544
71961431
wait/synch/mutex/innodb/redo_rseg_mutex
0.0905
2058
<70 secs of update-index workload with 1024 threads>
wait/synch/mutex/innodb/redo_rseg_mutex
21298612.2935
17804286
<70 secs of update-index workload with 1024 threads>
wait/synch/mutex/innodb/redo_rseg_mutex
61677736.5123
37997346
<70 secs of update-index workload with 1024 threads>
wait/synch/mutex/innodb/redo_rseg_mutex
102884379.5462
58871725
Attachments
Issue Links
causes
MDEV-26193Skip check to invoke purge if the transaction is read-only
Closed
MDEV-27935Enable performance_schema profiling for trx_rseg_t latch
Closed
relates to
MDEV-11383AliSQL: [Feature] Issue#29: ADD INFORMATION_SCHEMA.INNODB_RSEG TABLE TO DISPLAY THE ROLLBACK INFORMATION
The observed performance bottleneck can be addressed in two ways. The short-term solution is to retain the current file format and allow a more efficient assignment of the 128 rollback segments to transactions.
The long-term solution would be a file format change. When it comes to that, I noticed my old comment in MDEV-11657 (which is basically a scratchpad of loose ideas):
In DB_ROLL_PTR, the rollback segment ID could identify the undo tablespace. Theoretically, given that each DB_TRX_ID has only one persistent rollback segment, we would not even need that; MVCC could look up the undo tablespace based on the DB_TRX_ID. This would require extending main memory data structures so that some data of committed transactions would be stored until the transactions are purged.
We could repurpose the 7 bits in DB_ROLL_PTR to be flags for future use (always write them as zero from now on), and retire the TRX_SYS page which was demoted into a mere directory of undo tablespace header pages in MDEV-15158.
We could allow any number of undo tablespaces (much larger than 128). On startup, we would recover the undo log header pages from each undo tablespace that is found (based on a file name), as well as recover the rollback segment of each active transaction. Each undo tablespace could contain multiple rollback segments, as defined by the new undo tablespace format.
If we went this route, we would probably refuse server startup if the undo logs are not empty, so that we will not have to support two undo log formats in the same executable.
Marko Mäkelä
added a comment - The observed performance bottleneck can be addressed in two ways. The short-term solution is to retain the current file format and allow a more efficient assignment of the 128 rollback segments to transactions.
The long-term solution would be a file format change. When it comes to that, I noticed my old comment in MDEV-11657 (which is basically a scratchpad of loose ideas):
In DB_ROLL_PTR , the rollback segment ID could identify the undo tablespace. Theoretically, given that each DB_TRX_ID has only one persistent rollback segment, we would not even need that; MVCC could look up the undo tablespace based on the DB_TRX_ID . This would require extending main memory data structures so that some data of committed transactions would be stored until the transactions are purged.
We could repurpose the 7 bits in DB_ROLL_PTR to be flags for future use (always write them as zero from now on), and retire the TRX_SYS page which was demoted into a mere directory of undo tablespace header pages in MDEV-15158 .
We could allow any number of undo tablespaces (much larger than 128). On startup, we would recover the undo log header pages from each undo tablespace that is found (based on a file name), as well as recover the rollback segment of each active transaction. Each undo tablespace could contain multiple rollback segments, as defined by the new undo tablespace format.
If we went this route, we would probably refuse server startup if the undo logs are not empty, so that we will not have to support two undo log formats in the same executable.
I understood that there is an observable performance regression on 10.6 compared to 10.4. It could be possibly related to MDEV-21452, which removed the spinloop on the rollback segment mutex.
I just finished a prototype that not only replaces the normal mutex with srw_mutex (so that it will use a spinloop on Linux and OpenBSD again, and be SRWLOCK on Windows, and pthread_mutex_t on anything else) but also removes some completely needless acquisition of the mutex. Furthermore, we will use relaxed atomic memory operations around the reference-counting, so that the mutex will not be needed at transaction start.
Marko Mäkelä
added a comment - I understood that there is an observable performance regression on 10.6 compared to 10.4. It could be possibly related to MDEV-21452 , which removed the spinloop on the rollback segment mutex.
I just finished a prototype that not only replaces the normal mutex with srw_mutex (so that it will use a spinloop on Linux and OpenBSD again, and be SRWLOCK on Windows, and pthread_mutex_t on anything else) but also removes some completely needless acquisition of the mutex. Furthermore, we will use relaxed atomic memory operations around the reference-counting, so that the mutex will not be needed at transaction start.
People
Marko Mäkelä
Krunal Bauskar
Votes:
0Vote for this issue
Watchers:
5Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1515.8999996185303,"ttfb":426.79999923706055,"pageVisibility":"visible","entityId":97624,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"6f98fe2d-2a6b-42d6-ba2b-5eedd27a77d4","navigationType":0,"readyForUser":1597.7999992370605,"redirectCount":0,"resourceLoadedEnd":1920.8999996185303,"resourceLoadedStart":432.5,"resourceTiming":[{"duration":548,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":432.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":432.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":980.5,"responseStart":0,"secureConnectionStart":0},{"duration":548.1000003814697,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":432.79999923706055,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":432.79999923706055,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":980.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":604.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":433,"connectEnd":433,"connectStart":433,"domainLookupEnd":433,"domainLookupStart":433,"fetchStart":433,"redirectEnd":0,"redirectStart":0,"requestStart":433,"responseEnd":1037.6999998092651,"responseStart":1037.6999998092651,"secureConnectionStart":433},{"duration":680,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":433.0999994277954,"connectEnd":433.0999994277954,"connectStart":433.0999994277954,"domainLookupEnd":433.0999994277954,"domainLookupStart":433.0999994277954,"fetchStart":433.0999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":433.0999994277954,"responseEnd":1113.0999994277954,"responseStart":1113.0999994277954,"secureConnectionStart":433.0999994277954},{"duration":683.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":433.3999996185303,"connectEnd":433.3999996185303,"connectStart":433.3999996185303,"domainLookupEnd":433.3999996185303,"domainLookupStart":433.3999996185303,"fetchStart":433.3999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":433.3999996185303,"responseEnd":1117.0999994277954,"responseStart":1117,"secureConnectionStart":433.3999996185303},{"duration":683.9000005722046,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":433.5999994277954,"connectEnd":433.5999994277954,"connectStart":433.5999994277954,"domainLookupEnd":433.5999994277954,"domainLookupStart":433.5999994277954,"fetchStart":433.5999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":433.5999994277954,"responseEnd":1117.5,"responseStart":1117.5,"secureConnectionStart":433.5999994277954},{"duration":684.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":433.79999923706055,"connectEnd":433.79999923706055,"connectStart":433.79999923706055,"domainLookupEnd":433.79999923706055,"domainLookupStart":433.79999923706055,"fetchStart":433.79999923706055,"redirectEnd":0,"redirectStart":0,"requestStart":433.79999923706055,"responseEnd":1117.8999996185303,"responseStart":1117.8999996185303,"secureConnectionStart":433.79999923706055},{"duration":748.1999998092651,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":434,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":434,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1182.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":684.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":434.0999994277954,"connectEnd":434.0999994277954,"connectStart":434.0999994277954,"domainLookupEnd":434.0999994277954,"domainLookupStart":434.0999994277954,"fetchStart":434.0999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":434.0999994277954,"responseEnd":1118.3999996185303,"responseStart":1118.3999996185303,"secureConnectionStart":434.0999994277954},{"duration":748.1000003814697,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":434.29999923706055,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":434.29999923706055,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1182.3999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":684.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":434.5,"connectEnd":434.5,"connectStart":434.5,"domainLookupEnd":434.5,"domainLookupStart":434.5,"fetchStart":434.5,"redirectEnd":0,"redirectStart":0,"requestStart":434.5,"responseEnd":1119,"responseStart":1119,"secureConnectionStart":434.5},{"duration":1156.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":435.5,"connectEnd":435.5,"connectStart":435.5,"domainLookupEnd":435.5,"domainLookupStart":435.5,"fetchStart":435.5,"redirectEnd":0,"redirectStart":0,"requestStart":435.5,"responseEnd":1592.1999998092651,"responseStart":1592.1999998092651,"secureConnectionStart":435.5},{"duration":1485.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":435.5999994277954,"connectEnd":435.5999994277954,"connectStart":435.5999994277954,"domainLookupEnd":435.5999994277954,"domainLookupStart":435.5999994277954,"fetchStart":435.5999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":435.5999994277954,"responseEnd":1920.8999996185303,"responseStart":1920.8999996185303,"secureConnectionStart":435.5999994277954},{"duration":415.19999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1195.3999996185303,"connectEnd":1195.3999996185303,"connectStart":1195.3999996185303,"domainLookupEnd":1195.3999996185303,"domainLookupStart":1195.3999996185303,"fetchStart":1195.3999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":1195.3999996185303,"responseEnd":1610.5999994277954,"responseStart":1610.5999994277954,"secureConnectionStart":1195.3999996185303}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":255,"responseStart":427,"responseEnd":428,"domLoading":430,"domInteractive":1945,"domContentLoadedEventStart":1945,"domContentLoadedEventEnd":1986,"domComplete":2611,"loadEventStart":2612,"loadEventEnd":2612,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1926.2999992370605},{"name":"bigPipe.sidebar-id.end","time":1927.0999994277954},{"name":"bigPipe.activity-panel-pipe-id.start","time":1927.2999992370605},{"name":"bigPipe.activity-panel-pipe-id.end","time":1928.3999996185303},{"name":"activityTabFullyLoaded","time":2016.3999996185303}],"measures":[],"correlationId":"76c55fa81f374f","effectiveType":"4g","downlink":9.1,"rtt":0,"serverDuration":111,"dbReadsTimeInMs":18,"dbConnsTimeInMs":27,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
The observed performance bottleneck can be addressed in two ways. The short-term solution is to retain the current file format and allow a more efficient assignment of the 128 rollback segments to transactions.
The long-term solution would be a file format change. When it comes to that, I noticed my old comment in MDEV-11657 (which is basically a scratchpad of loose ideas):
We could repurpose the 7 bits in DB_ROLL_PTR to be flags for future use (always write them as zero from now on), and retire the TRX_SYS page which was demoted into a mere directory of undo tablespace header pages in
MDEV-15158.We could allow any number of undo tablespaces (much larger than 128). On startup, we would recover the undo log header pages from each undo tablespace that is found (based on a file name), as well as recover the rollback segment of each active transaction. Each undo tablespace could contain multiple rollback segments, as defined by the new undo tablespace format.
If we went this route, we would probably refuse server startup if the undo logs are not empty, so that we will not have to support two undo log formats in the same executable.