When run after master server crash --tc-heuristic-recover=rollback produces inconsistent server state with binlog still containing transactions that were rolled back by the option.
Such way recovered server may not be used for replication. E.g when such way recovered
ex-master is demoted into slave its binlog state needs further correction to subtract
the rolled back transactions from its binlog status. Otherwise the "new" slave might claim
those transactions as locally present at the (gtid) master-slave connection protocol. At the same time the actual "new" master may never have seen those transactions (because they never arrived at it when it was formerly slave, due to the crash).
Currently we have a problems with the binary logs:
We truncate transactions from the binary log that was already fully written there (and thus may be on the slave)
When we truncate a half-written-transaction we don't take into account that a slave may have already got part of it and will reuse the GTID id if reconnecting to a master that died in the middle of writing the transaction to the binary log.
The fix should be:
All fully written transactions in the binary log should roll forwards. The half or not written one should do rollback.
In case of the master-slave connection closing before the slave gets a full transaction, it should reconnect to the master and ask for the 'transaction after the last fully received GTID.
Alternatively it could use the last binlog file + position (which should work fine as the master binlog was truncated).
Michael Widenius
added a comment - Currently we have a problems with the binary logs:
We truncate transactions from the binary log that was already fully written there (and thus may be on the slave)
When we truncate a half-written-transaction we don't take into account that a slave may have already got part of it and will reuse the GTID id if reconnecting to a master that died in the middle of writing the transaction to the binary log.
The fix should be:
All fully written transactions in the binary log should roll forwards. The half or not written one should do rollback.
In case of the master-slave connection closing before the slave gets a full transaction, it should reconnect to the master and ask for the 'transaction after the last fully received GTID.
Alternatively it could use the last binlog file + position (which should work fine as the master binlog was truncated).
monty, to the fully written ones, when a server restarts having both rpl_semi_sync_MASTER,SLAVE_enabled, the semisync slave recovery does not check the first of the two. Hence a server intended to be a master, that is having rpl_semi_sync_MASTER_enabled = true rolls back transactions in doubt unnecessarily and possibly harmfully.
That's an issue. It could be trivially fixed with adding the check of rpl_semi_sync_MASTER_enabled and when it's true it would override.
The half-written is an issue only because the slave is reconnecting to a master that passed through the semisync recovery with the intent to become slave (such master may discard from its binlog a transaction being received by slave.
It will disappear when the new rpl_semi_sync_MASTER_enabled-based decision will be implemented.
Andrei Elkin
added a comment - monty , to the fully written ones, when a server restarts having both rpl_semi_sync_MASTER,SLAVE_enabled , the semisync slave recovery does not check the first of the two. Hence a server intended to be a master, that is having rpl_semi_sync_MASTER_enabled = true rolls back transactions in doubt unnecessarily and possibly harmfully.
That's an issue. It could be trivially fixed with adding the check of rpl_semi_sync_MASTER_enabled and when it's true it would override.
The half-written is an issue only because the slave is reconnecting to a master that passed through the semisync recovery with the intent to become slave (such master may discard from its binlog a transaction being received by slave.
It will disappear when the new rpl_semi_sync_MASTER_enabled -based decision will be implemented.
MDEV-33424 is not acceptable as it does not solve the issue at hand. Please fix this issue the way I have described!
To make things clear, this is how to fix this issue:
rpl_semi_sync_slave_enable should not have anything to do with recovery.
We should only delete things from the binary log that was found to be half written during recovery. This could be a new option that could default to always one.
I personally don't think we need an option for this as this should always be safe to do (and should save disk space)
We should that if a slave re-connects it should continue from the next transaction after the last full transaction it has read.
In other words. if the slave reads GTID-1 and master crashes while reading GTID-2 then the slave should ask for 'next transaction after GTD-1'
Alternatively it can ask the transaction based on the last binlog position it was using.
Michael Widenius
added a comment - - edited MDEV-33424 is not acceptable as it does not solve the issue at hand. Please fix this issue the way I have described!
To make things clear, this is how to fix this issue:
rpl_semi_sync_slave_enable should not have anything to do with recovery.
We should only delete things from the binary log that was found to be half written during recovery. This could be a new option that could default to always one.
I personally don't think we need an option for this as this should always be safe to do (and should save disk space)
We should that if a slave re-connects it should continue from the next transaction after the last full transaction it has read.
In other words. if the slave reads GTID-1 and master crashes while reading GTID-2 then the slave should ask for 'next transaction after GTD-1'
Alternatively it can ask the transaction based on the last binlog position it was using.
People
Brandon Nesterenko
Andrei Elkin
Votes:
3Vote for this issue
Watchers:
24Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":2035.5,"ttfb":277.30000019073486,"pageVisibility":"visible","entityId":80619,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":32,"apdex":0.5,"journeyId":"bae73442-c4ce-4e28-8b7d-9d2b5cc75c92","navigationType":0,"readyForUser":2207.300000190735,"redirectCount":0,"resourceLoadedEnd":1661.8000001907349,"resourceLoadedStart":295.60000014305115,"resourceTiming":[{"duration":213.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":295.60000014305115,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":295.60000014305115,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":508.7000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":213.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":295.90000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":295.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":509,"responseStart":0,"secureConnectionStart":0},{"duration":897.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":296.10000014305115,"connectEnd":296.10000014305115,"connectStart":296.10000014305115,"domainLookupEnd":296.10000014305115,"domainLookupStart":296.10000014305115,"fetchStart":296.10000014305115,"redirectEnd":0,"redirectStart":0,"requestStart":530.5,"responseEnd":1193.6000001430511,"responseStart":579.5,"secureConnectionStart":296.10000014305115},{"duration":1365.6000001430511,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":296.2000000476837,"connectEnd":296.2000000476837,"connectStart":296.2000000476837,"domainLookupEnd":296.2000000476837,"domainLookupStart":296.2000000476837,"fetchStart":296.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":530.6000001430511,"responseEnd":1661.8000001907349,"responseStart":580,"secureConnectionStart":296.2000000476837},{"duration":295.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":296.5,"connectEnd":296.5,"connectStart":296.5,"domainLookupEnd":296.5,"domainLookupStart":296.5,"fetchStart":296.5,"redirectEnd":0,"redirectStart":0,"requestStart":530.7000000476837,"responseEnd":592,"responseStart":585.1000001430511,"secureConnectionStart":296.5},{"duration":296.10000014305115,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":296.7000000476837,"connectEnd":296.7000000476837,"connectStart":296.7000000476837,"domainLookupEnd":296.7000000476837,"domainLookupStart":296.7000000476837,"fetchStart":296.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":530.8000001907349,"responseEnd":592.8000001907349,"responseStart":586.4000000953674,"secureConnectionStart":296.7000000476837},{"duration":296.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":296.90000009536743,"connectEnd":296.90000009536743,"connectStart":296.90000009536743,"domainLookupEnd":296.90000009536743,"domainLookupStart":296.90000009536743,"fetchStart":296.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":531,"responseEnd":593.4000000953674,"responseStart":587,"secureConnectionStart":296.90000009536743},{"duration":212.70000004768372,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":297.10000014305115,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":297.10000014305115,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":509.80000019073486,"responseStart":0,"secureConnectionStart":0},{"duration":296.5,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":297.2000000476837,"connectEnd":297.2000000476837,"connectStart":297.2000000476837,"domainLookupEnd":297.2000000476837,"domainLookupStart":297.2000000476837,"fetchStart":297.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":531.2000000476837,"responseEnd":593.7000000476837,"responseStart":587.9000000953674,"secureConnectionStart":297.2000000476837},{"duration":212.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":297.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":297.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":510,"responseStart":0,"secureConnectionStart":0},{"duration":299.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":297.5,"connectEnd":297.5,"connectStart":297.5,"domainLookupEnd":297.5,"domainLookupStart":297.5,"fetchStart":297.5,"redirectEnd":0,"redirectStart":0,"requestStart":531.3000001907349,"responseEnd":597.4000000953674,"responseStart":589.1000001430511,"secureConnectionStart":297.5},{"duration":1349.9000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":299.90000009536743,"connectEnd":299.90000009536743,"connectStart":299.90000009536743,"domainLookupEnd":299.90000009536743,"domainLookupStart":299.90000009536743,"fetchStart":299.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":881.8000001907349,"responseEnd":1649.8000001907349,"responseStart":1621.6000001430511,"secureConnectionStart":299.90000009536743},{"duration":1353,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":299.90000009536743,"connectEnd":299.90000009536743,"connectStart":299.90000009536743,"domainLookupEnd":299.90000009536743,"domainLookupStart":299.90000009536743,"fetchStart":299.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":919.6000001430511,"responseEnd":1652.9000000953674,"responseStart":1625.2000000476837,"secureConnectionStart":299.90000009536743},{"duration":262.7000000476837,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1557.1000001430511,"connectEnd":1557.1000001430511,"connectStart":1557.1000001430511,"domainLookupEnd":1557.1000001430511,"domainLookupStart":1557.1000001430511,"fetchStart":1557.1000001430511,"redirectEnd":0,"redirectStart":0,"requestStart":1788,"responseEnd":1819.8000001907349,"responseStart":1819.2000000476837,"secureConnectionStart":1557.1000001430511}],"fetchStart":1,"domainLookupStart":1,"domainLookupEnd":1,"connectStart":1,"connectEnd":1,"requestStart":63,"responseStart":278,"responseEnd":286,"domLoading":294,"domInteractive":2309,"domContentLoadedEventStart":2309,"domContentLoadedEventEnd":2376,"domComplete":2938,"loadEventStart":2938,"loadEventEnd":2938,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":2266.2000000476837},{"name":"bigPipe.sidebar-id.end","time":2266.9000000953674},{"name":"bigPipe.activity-panel-pipe-id.start","time":2267},{"name":"bigPipe.activity-panel-pipe-id.end","time":2268.7000000476837},{"name":"activityTabFullyLoaded","time":2402.9000000953674}],"measures":[],"correlationId":"4ef2c53c3136ab","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":130,"dbReadsTimeInMs":35,"dbConnsTimeInMs":47,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Documentation is updated with the following diff.