In the MDEV-32020 description, it's not repeated in detail here, two transaction cannot be isolated on slave because they used different non-unique indexes on master and slave.
As the first of the two is a prepared XA
--connection slave_worker_1
xa start 'xid'; /* ... lock here ... */ ; xa prepare 'xid'
the 2nd
--connection slave_worker_2
begin; /* ... get lock ... => wait/hang...error out */
could not wait up for the conflicting lock, despite the XA transaction did not really
lock its target record of the non-clustered index.
The hang was really caused by a method to reach the needed record which is the index scan.
The scanning orthodoxically could not step over a record that was rightfully locked by the XA.
However as the record can not be targeted by the 2nd transaction, otherwise the transactions
would have sensed the conflict back on master, it would be alright to not panic
at seeing a timeout error from the engine. Instead the scanning would just proceed to next free index records of the same key value and ultimately must reach the target one.
More generally, on the way to its target all busy records belonging to earlier (binlog order) transactions need not to be locked by the current one.
A patch is implemented to carry out the description's agenda.
Attachments
Issue Links
relates to
MDEV-32020XA transaction replicates incorrectly, must be applied at XA COMMIT, not XA PREPARE
Could you please have a look at bb-10.11-andrei?
You may not be the only reviewer, but let me pick you first.
Cheers,
Andrei
Andrei Elkin
added a comment - Howdy Brandon!
Could you please have a look at bb-10.11-andrei?
You may not be the only reviewer, but let me pick you first.
Cheers,
Andrei
Sure Innodb team needs to look at the patch. Actually it was under way already in form of MDEV-34466 that I had to find out as it was blocking my progress on these fixes (that include a hunk for the latter bug). Btw they do not really change innodb_lock_wait_timeout policy, at least never beyond replication.
To view this work as removing limitations (which they are) of engine/server locking protocol started by MDEV-26682 and followed in MDEV-33454 is much more fair in my opinion.
As to the comparison of this and Kristian's method of resolving MDEV-32020, I only can repeat for what's been said multiple times.
Arguably they are not mutually exclusive.
Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
As it has to be reliable and fast the method of collecting and deferred applying of XA events simply may not be an option ('cos
it *hopes* on replaying would succeed while time to spend on that is always affordable).
Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.
Andrei Elkin
added a comment - marko , thank you for attending this ticket!
Sure Innodb team needs to look at the patch. Actually it was under way already in form of MDEV-34466 that I had to find out as it was blocking my progress on these fixes (that include a hunk for the latter bug). Btw they do not really change innodb_lock_wait_timeout policy, at least never beyond replication.
To view this work as removing limitations (which they are) of engine/server locking protocol started by MDEV-26682 and followed in
MDEV-33454 is much more fair in my opinion.
As to the comparison of this and Kristian's method of resolving MDEV-32020 , I only can repeat for what's been said multiple times.
Arguably they are not mutually exclusive.
Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
As it has to be reliable and fast the method of collecting and deferred applying of XA events simply may not be an option ('cos
it * hopes * on replaying would succeed while time to spend on that is always affordable).
Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.
> Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.
There is already a (prototype) implementation of this, see knielsen_mdev32020
> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
Really? The failing over of a prepared XA transaction is a (very) rare operation. The normal apply of an XA transaction will occur thousands or millions of times more often. Applying the XA PREPARE on the slave pessimises the common operation by doubling the work on the slave to process two event groups, two GTIDs, and two commits inside InnoDB.
I think there's a misconception that somehow the XA PREPAREd transaction will normally be already applied on the slave in the case where a failover occurs. That's unlikely to be the case, especially for transactions that takes longer to apply, as the commit on the master will normally arrive shortly after the prepare, while the slave can only start applying the xa prepare after it has been synced to the binlog on the master.
For the rare user that really wants to recover an XA PREPAREd (but not committed) master transaction on the slave, it will be necessary to apply the to-be-recovered prepare on the slave. The requirements for this are similar, though it is simplified in the MDEV-32020 proposal since there is no requirement to apply them in order like in current code. However, there is no need to do so for the wast majority of transactions that are prepared+committed on the master, that just introduces a lot of unnecessary overhead and complications.
Kristian Nielsen
added a comment - > Neither I am certain that implementation of collecting for deferred applying XA events is really straightforward.
There is already a (prototype) implementation of this, see knielsen_mdev32020
> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
Really? The failing over of a prepared XA transaction is a (very) rare operation. The normal apply of an XA transaction will occur thousands or millions of times more often. Applying the XA PREPARE on the slave pessimises the common operation by doubling the work on the slave to process two event groups, two GTIDs, and two commits inside InnoDB.
I think there's a misconception that somehow the XA PREPAREd transaction will normally be already applied on the slave in the case where a failover occurs. That's unlikely to be the case, especially for transactions that takes longer to apply, as the commit on the master will normally arrive shortly after the prepare, while the slave can only start applying the xa prepare after it has been synced to the binlog on the master.
For the rare user that really wants to recover an XA PREPAREd (but not committed) master transaction on the slave, it will be necessary to apply the to-be-recovered prepare on the slave. The requirements for this are similar, though it is simplified in the MDEV-32020 proposal since there is no requirement to apply them in order like in current code. However, there is no need to do so for the wast majority of transactions that are prepared+committed on the master, that just introduces a lot of unnecessary overhead and complications.
>> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
>Really? The failing over of a prepared XA transaction is a (very) rare operation.
Sorry, 'rare operation' is something none, with all due respect to you dear Kristian, but an actual user can claim.
And we don't know the future either.
Similarly to 'normally' here
> as the commit on the master will normally arrive shortly after the prepare
Let's admit that it's all your adjectives and assumptions that may not come true.
To the usability matter, also please refer to mysql@oracle state of xa replication.
I've not heard from them, having this solution in place since 2013, any ideas to cancel it.
I suggest to leave this matter alone. It's a firm and obvious fact that only the eager replication of
XA-prepare provides instant recovery.
This approach can take its toll Actually not, find here "extra fsync" dismissal.
> two GTIDs, and two commits inside InnoDB.
However it's not about 'doubling the work' at all. I and Brandon shared MDEV-31949 consoling benchmarkings. There's no reason to doubt they could be improved still.
Andrei Elkin
added a comment - - edited >> Yet MDEV-742 is clearly a preferred choice when the user requires failover to slave.
>Really? The failing over of a prepared XA transaction is a (very) rare operation.
Sorry, 'rare operation' is something none, with all due respect to you dear Kristian, but an actual user can claim.
And we don't know the future either.
Similarly to 'normally' here
> as the commit on the master will normally arrive shortly after the prepare
Let's admit that it's all your adjectives and assumptions that may not come true.
To the usability matter, also please refer to mysql@oracle state of xa replication.
I've not heard from them, having this solution in place since 2013, any ideas to cancel it.
I suggest to leave this matter alone. It's a firm and obvious fact that only the eager replication of
XA-prepare provides instant recovery.
This approach can take its toll Actually not, find here "extra fsync" dismissal .
> two GTIDs, and two commits inside InnoDB.
However it's not about 'doubling the work' at all. I and Brandon shared MDEV-31949 consoling benchmarkings. There's no reason to doubt they could be improved still.
Could you please concurrency tests to prove the fixes.
They are currently pushed to bb-10.6-andrei to base on a not-fully completed MDEV-34466 branch. That was necessary to avoid some errors of the engine in a service that XA replication relies on.
Read more on what the patch is about in the commit message as well.
An mtr rpl suite test should be helping for insights how/what to test.
In brief I consider an arbitrary size worker pool, multiple clients on master running mixed
normal and XA transactions updating non-unique-only (including NULL-able unique) index
tables in ROW format.
All the sequential and the parallel optimistic or conservative slaves should complete the work
with consistent data and gtid state in the end.
GTID-connection mode (aka Change-Master-to master_use_gtdi) is irrelevant.
Gtid strict mode should be on.
Andrei Elkin
added a comment - - edited Howdy Susil!
Could you please concurrency tests to prove the fixes.
They are currently pushed to bb-10.6-andrei to base on a not-fully completed MDEV-34466 branch. That was necessary to avoid some errors of the engine in a service that XA replication relies on.
Read more on what the patch is about in the commit message as well.
An mtr rpl suite test should be helping for insights how/what to test.
In brief I consider an arbitrary size worker pool, multiple clients on master running mixed
normal and XA transactions updating non-unique-only (including NULL-able unique) index
tables in ROW format.
All the sequential and the parallel optimistic or conservative slaves should complete the work
with consistent data and gtid state in the end.
GTID-connection mode (aka Change-Master-to master_use_gtdi) is irrelevant.
Gtid strict mode should be on.
Kristian Nielsen
added a comment - Some comments on the patch: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/KFTWZ2CCNRFDJ77B4G4TGIHXMVMCFVHC/
I suggest that this be tested also with innodb_snapshot_isolation=ON. Hopefully it will replace some lock waits with other errors (ER_LOCK_WAIT_TIMEOUT, ER_CHECKREAD).
Marko Mäkelä
added a comment - I suggest that this be tested also with innodb_snapshot_isolation=ON . Hopefully it will replace some lock waits with other errors ( ER_LOCK_WAIT_TIMEOUT , ER_CHECKREAD ).
People
Susil Behera
Andrei Elkin
Votes:
1Vote for this issue
Watchers:
10Start watching this issue
Dates
Created:
Updated:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":877.3999998569489,"ttfb":232.59999990463257,"pageVisibility":"visible","entityId":129753,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":1,"journeyId":"19157e05-0389-4e0c-b88c-c1a779b59433","navigationType":0,"readyForUser":948.7999999523163,"redirectCount":0,"resourceLoadedEnd":616,"resourceLoadedStart":241,"resourceTiming":[{"duration":14,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":241,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":255,"responseStart":0,"secureConnectionStart":0},{"duration":14.400000095367432,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":241.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":255.59999990463257,"responseStart":0,"secureConnectionStart":0},{"duration":206.79999995231628,"initiatorType":"script","name":"https://jira.mariadb.org/s/fbf975c0cce4b1abf04784eeae9ba1f4-CDN/lu2bu7/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":241.39999985694885,"connectEnd":241.39999985694885,"connectStart":241.39999985694885,"domainLookupEnd":241.39999985694885,"domainLookupStart":241.39999985694885,"fetchStart":241.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":259.39999985694885,"responseEnd":448.19999980926514,"responseStart":284.59999990463257,"secureConnectionStart":241.39999985694885},{"duration":374.60000014305115,"initiatorType":"script","name":"https://jira.mariadb.org/s/099b33461394b8015fc36c0a4b96e19f-CDN/lu2bu7/820016/12ta74/8679b4946efa1a0bb029a3a22206fb5d/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":241.39999985694885,"connectEnd":241.39999985694885,"connectStart":241.39999985694885,"domainLookupEnd":241.39999985694885,"domainLookupStart":241.39999985694885,"fetchStart":241.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":259.5,"responseEnd":616,"responseStart":283.09999990463257,"secureConnectionStart":241.39999985694885},{"duration":50.200000047683716,"initiatorType":"script","name":"https://jira.mariadb.org/s/94c15bff32baef80f4096a08aceae8bc-CDN/lu2bu7/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":241.59999990463257,"connectEnd":241.59999990463257,"connectStart":241.59999990463257,"domainLookupEnd":241.59999990463257,"domainLookupStart":241.59999990463257,"fetchStart":241.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":260.7999999523163,"responseEnd":291.7999999523163,"responseStart":286.5,"secureConnectionStart":241.59999990463257},{"duration":50.700000047683716,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":241.59999990463257,"connectEnd":241.59999990463257,"connectStart":241.59999990463257,"domainLookupEnd":241.59999990463257,"domainLookupStart":241.59999990463257,"fetchStart":241.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":261.39999985694885,"responseEnd":292.2999999523163,"responseStart":287.59999990463257,"secureConnectionStart":241.59999990463257},{"duration":51.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":241.69999980926514,"connectEnd":241.69999980926514,"connectStart":241.69999980926514,"domainLookupEnd":241.69999980926514,"domainLookupStart":241.69999980926514,"fetchStart":241.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":266.59999990463257,"responseEnd":293.09999990463257,"responseStart":289.39999985694885,"secureConnectionStart":241.69999980926514},{"duration":18.799999952316284,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bu7/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":241.79999995231628,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.79999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":260.59999990463257,"responseStart":0,"secureConnectionStart":0},{"duration":96.20000004768372,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":241.79999995231628,"connectEnd":241.79999995231628,"connectStart":241.79999995231628,"domainLookupEnd":241.79999995231628,"domainLookupStart":241.79999995231628,"fetchStart":241.79999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":270.39999985694885,"responseEnd":338,"responseStart":336.2999999523163,"secureConnectionStart":241.79999995231628},{"duration":19.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":241.89999985694885,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":241.89999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":261.7999999523163,"responseStart":0,"secureConnectionStart":0},{"duration":55.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/3339d87fa2538a859872f2df449bf8d0-CDN/lu2bu7/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":242.09999990463257,"connectEnd":242.09999990463257,"connectStart":242.09999990463257,"domainLookupEnd":242.09999990463257,"domainLookupStart":242.09999990463257,"fetchStart":242.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":271.69999980926514,"responseEnd":297.5,"responseStart":293.59999990463257,"secureConnectionStart":242.09999990463257},{"duration":311,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":242.79999995231628,"connectEnd":242.79999995231628,"connectStart":242.79999995231628,"domainLookupEnd":242.79999995231628,"domainLookupStart":242.79999995231628,"fetchStart":242.79999995231628,"redirectEnd":0,"redirectStart":0,"requestStart":287.39999985694885,"responseEnd":553.7999999523163,"responseStart":547.7999999523163,"secureConnectionStart":242.79999995231628},{"duration":309.09999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bu7/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":245,"connectEnd":245,"connectStart":245,"domainLookupEnd":245,"domainLookupStart":245,"fetchStart":245,"redirectEnd":0,"redirectStart":0,"requestStart":289,"responseEnd":554.0999999046326,"responseStart":548.5,"secureConnectionStart":245},{"duration":134.89999985694885,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":591,"connectEnd":591,"connectStart":591,"domainLookupEnd":591,"domainLookupStart":591,"fetchStart":591,"redirectEnd":0,"redirectStart":0,"requestStart":694.3999998569489,"responseEnd":725.8999998569489,"responseStart":725.0999999046326,"secureConnectionStart":591},{"duration":83.5,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":834.1999998092651,"connectEnd":834.1999998092651,"connectStart":834.1999998092651,"domainLookupEnd":834.1999998092651,"domainLookupStart":834.1999998092651,"fetchStart":834.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":890.5999999046326,"responseEnd":917.6999998092651,"responseStart":917.0999999046326,"secureConnectionStart":834.1999998092651},{"duration":132.5,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":870.5999999046326,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":870.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1003.0999999046326,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":19,"responseStart":233,"responseEnd":244,"domLoading":236,"domInteractive":1050,"domContentLoadedEventStart":1050,"domContentLoadedEventEnd":1099,"domComplete":1196,"loadEventStart":1196,"loadEventEnd":1196,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1021},{"name":"bigPipe.sidebar-id.end","time":1021.6999998092651},{"name":"bigPipe.activity-panel-pipe-id.start","time":1021.8999998569489},{"name":"bigPipe.activity-panel-pipe-id.end","time":1024},{"name":"activityTabFullyLoaded","time":1120.5}],"measures":[],"correlationId":"8f64f135538645","effectiveType":"4g","downlink":9.4,"rtt":0,"serverDuration":148,"dbReadsTimeInMs":17,"dbConnsTimeInMs":80,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Howdy Brandon!
Could you please have a look at bb-10.11-andrei?
You may not be the only reviewer, but let me pick you first.
Cheers,
Andrei