The DML flow consists of simple INSERT / UPDATE / DELETE and also BEGIN/COMMIT which are executed in several threads each of which has set a session value of gtid_domain_id equal to CONNECTION_ID (hence it's unique for every thread). Replication (row-based) promptly fails, and after that an assertion failure happens. I assume that the assertion failure is one of those known issues with error handling, but the replication failure itself shouldn't be happening at the first place.
The result is the same whether the slave uses GTID or not. In the standard RQG it does not, but if you want to try with GTID, you can either start the servers separately, or apply the patch for RQG provided below.
RQG grammar (parallel-replication-2.yy):
query_init:
SET gtid_domain_id = CONNECTION_ID() ;
query:
transaction |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete |
insert_replace | update | delete ;
set_domain_id:
SET gtid_domain_id = _digit ;
transaction:
START TRANSACTION |
COMMIT ;
insert_replace:
INSERT INTO _table (`pk`) VALUES (NULL) ;
update:
UPDATE _table SET _field_no_pk = value where ORDER BY _field_list LIMIT large_digit ;
delete:
DELETE FROM _table where_delete ORDER BY _field_list LIMIT small_digit ;
where:
|
WHERE _field_key < value |
WHERE _field_key IN ( value , value , value , value , value ) |
WHERE _field_key BETWEEN small_digit AND large_digit |
WHERE _field_key BETWEEN _tinyint_unsigned AND _int_unsigned ;
where_delete:
|
WHERE _field_key = value |
WHERE _field_key IN ( value , value , value , value , value ) |
WHERE _field_key BETWEEN small_digit AND large_digit ;
I pushed a fix of two separate issues. Now the RQG command line works for me
in the non-GTID case (no errors occur during replication).
The problems were: 1) In non-GTID mode, we should not attempt to do different
domains in parallel; 2) when we do group-committed transactions in parallel,
we did not correctly wait for the previous group of transactions to complete
before starting on the next one.
This fix does not fix the assertion in case of replication error. I will look
into that next.
Note that I think that in GTID mode, this test is supposed to cause
replication to fail (but not to assert, of course). Because if understand
correctly, the queries in different domains are not guaranteed to be
independent, so it is not necessarily valid to put them in different domains
and replicate them in parallel (correct me if I am wrong). This is not a
critique of the test case, which clearly was efficient in finding numerous
bugs, just a note of something to be aware of.
Kristian Nielsen
added a comment - - edited I pushed a fix of two separate issues. Now the RQG command line works for me
in the non-GTID case (no errors occur during replication).
The problems were: 1) In non-GTID mode, we should not attempt to do different
domains in parallel; 2) when we do group-committed transactions in parallel,
we did not correctly wait for the previous group of transactions to complete
before starting on the next one.
This fix does not fix the assertion in case of replication error. I will look
into that next.
Note that I think that in GTID mode, this test is supposed to cause
replication to fail (but not to assert, of course). Because if understand
correctly, the queries in different domains are not guaranteed to be
independent, so it is not necessarily valid to put them in different domains
and replicate them in parallel (correct me if I am wrong). This is not a
critique of the test case, which clearly was efficient in finding numerous
bugs, just a note of something to be aware of.
>> the queries in different domains are not guaranteed to be independent, so it is not necessarily valid to put them in different domains and replicate them in parallel (correct me if I am wrong)
You are totally right, my bad. I'll take it into account while creating and running further tests, it should be easy enough to fix, e.g. to make different threads work with different default databases.
Elena Stepanova
added a comment - >> the queries in different domains are not guaranteed to be independent, so it is not necessarily valid to put them in different domains and replicate them in parallel (correct me if I am wrong)
You are totally right, my bad. I'll take it into account while creating and running further tests, it should be easy enough to fix, e.g. to make different threads work with different default databases.
I've now fixed error handling, so that the replication should stop as expected
and no assertion happens.
As mentioned, the fact that an error happens (in GTID mode) is expected due to
conflicts between events in different replication domains.
When run in non-GTID mode, the test succeeds for me.
Thanks again for a great testcase that was very useful to find a number of
important issues.
Kristian Nielsen
added a comment - I've now fixed error handling, so that the replication should stop as expected
and no assertion happens.
As mentioned, the fact that an error happens (in GTID mode) is expected due to
conflicts between events in different replication domains.
When run in non-GTID mode, the test succeeds for me.
Thanks again for a great testcase that was very useful to find a number of
important issues.
People
Kristian Nielsen
Elena Stepanova
Votes:
0Vote for this issue
Watchers:
1Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":975.2000000476837,"ttfb":280.7000000476837,"pageVisibility":"visible","entityId":26125,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"92cbc26b-9ca2-495c-aaa0-fcad22175b05","navigationType":0,"readyForUser":1064.7000000476837,"redirectCount":0,"resourceLoadedEnd":1206.2000000476837,"resourceLoadedStart":286.2999999523163,"resourceTiming":[{"duration":112.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":286.2999999523163,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":286.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":398.7999999523163,"responseStart":0,"secureConnectionStart":0},{"duration":131.20000004768372,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":286.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":286.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":417.7000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":172.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":286.7999999523163,"connectEnd":286.7999999523163,"connectStart":286.7999999523163,"domainLookupEnd":286.7999999523163,"domainLookupStart":286.7999999523163,"fetchStart":286.7999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":286.7999999523163,"responseEnd":459.39999985694885,"responseStart":459.39999985694885,"secureConnectionStart":286.7999999523163},{"duration":247.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":286.89999985694885,"connectEnd":286.89999985694885,"connectStart":286.89999985694885,"domainLookupEnd":286.89999985694885,"domainLookupStart":286.89999985694885,"fetchStart":286.89999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":286.89999985694885,"responseEnd":534.2999999523163,"responseStart":534.2999999523163,"secureConnectionStart":286.89999985694885},{"duration":251.20000004768372,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":287.09999990463257,"connectEnd":287.09999990463257,"connectStart":287.09999990463257,"domainLookupEnd":287.09999990463257,"domainLookupStart":287.09999990463257,"fetchStart":287.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":287.09999990463257,"responseEnd":538.2999999523163,"responseStart":538.2999999523163,"secureConnectionStart":287.09999990463257},{"duration":251.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":287.2999999523163,"connectEnd":287.2999999523163,"connectStart":287.2999999523163,"domainLookupEnd":287.2999999523163,"domainLookupStart":287.2999999523163,"fetchStart":287.2999999523163,"redirectEnd":0,"redirectStart":0,"requestStart":287.2999999523163,"responseEnd":538.7999999523163,"responseStart":538.7999999523163,"secureConnectionStart":287.2999999523163},{"duration":251.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":287.5,"connectEnd":287.5,"connectStart":287.5,"domainLookupEnd":287.5,"domainLookupStart":287.5,"fetchStart":287.5,"redirectEnd":0,"redirectStart":0,"requestStart":287.5,"responseEnd":539.0999999046326,"responseStart":539.0999999046326,"secureConnectionStart":287.5},{"duration":315.2999999523163,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":287.7000000476837,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":287.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":603,"responseStart":0,"secureConnectionStart":0},{"duration":251.89999985694885,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":287.7000000476837,"connectEnd":287.7000000476837,"connectStart":287.7000000476837,"domainLookupEnd":287.7000000476837,"domainLookupStart":287.7000000476837,"fetchStart":287.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":287.7000000476837,"responseEnd":539.5999999046326,"responseStart":539.5999999046326,"secureConnectionStart":287.7000000476837},{"duration":315.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":288,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":288,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":603.0999999046326,"responseStart":0,"secureConnectionStart":0},{"duration":252,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":288.09999990463257,"connectEnd":288.09999990463257,"connectStart":288.09999990463257,"domainLookupEnd":288.09999990463257,"domainLookupStart":288.09999990463257,"fetchStart":288.09999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":288.09999990463257,"responseEnd":540.0999999046326,"responseStart":540.0999999046326,"secureConnectionStart":288.09999990463257},{"duration":388.39999985694885,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":289,"connectEnd":289,"connectStart":289,"domainLookupEnd":289,"domainLookupStart":289,"fetchStart":289,"redirectEnd":0,"redirectStart":0,"requestStart":289,"responseEnd":677.3999998569489,"responseStart":677.3999998569489,"secureConnectionStart":289},{"duration":898,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":293.39999985694885,"connectEnd":293.39999985694885,"connectStart":293.39999985694885,"domainLookupEnd":293.39999985694885,"domainLookupStart":293.39999985694885,"fetchStart":293.39999985694885,"redirectEnd":0,"redirectStart":0,"requestStart":293.39999985694885,"responseEnd":1191.3999998569489,"responseStart":1191.2999999523163,"secureConnectionStart":293.39999985694885},{"duration":80.20000004768372,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":614.5,"connectEnd":614.5,"connectStart":614.5,"domainLookupEnd":614.5,"domainLookupStart":614.5,"fetchStart":614.5,"redirectEnd":0,"redirectStart":0,"requestStart":614.5,"responseEnd":694.7000000476837,"responseStart":694.5999999046326,"secureConnectionStart":614.5},{"duration":315.7000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/d5715adaadd168a9002b108b2b039b50-CDN/lu2cib/820016/12ta74/be4b45e9cec53099498fa61c8b7acba4/_/download/contextbatch/css/jira.project.sidebar,-_super,-project.issue.navigator,-jira.general,-jira.browse.project,-jira.view.issue,-jira.global,-atl.general,-com.atlassian.jira.projects.sidebar.init/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":890.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":890.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1206.2000000476837,"responseStart":0,"secureConnectionStart":0}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":106,"responseStart":281,"responseEnd":286,"domLoading":284,"domInteractive":1214,"domContentLoadedEventStart":1214,"domContentLoadedEventEnd":1259,"domComplete":1529,"loadEventStart":1529,"loadEventEnd":1530,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1193.2999999523163},{"name":"bigPipe.sidebar-id.end","time":1194.0999999046326},{"name":"bigPipe.activity-panel-pipe-id.start","time":1194.2000000476837},{"name":"bigPipe.activity-panel-pipe-id.end","time":1195.5},{"name":"activityTabFullyLoaded","time":1273.7000000476837}],"measures":[],"correlationId":"797196f0ad6818","effectiveType":"4g","downlink":10,"rtt":0,"serverDuration":103,"dbReadsTimeInMs":12,"dbConnsTimeInMs":20,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
I pushed a fix of two separate issues. Now the RQG command line works for me
in the non-GTID case (no errors occur during replication).
The problems were: 1) In non-GTID mode, we should not attempt to do different
domains in parallel; 2) when we do group-committed transactions in parallel,
we did not correctly wait for the previous group of transactions to complete
before starting on the next one.
This fix does not fix the assertion in case of replication error. I will look
into that next.
Note that I think that in GTID mode, this test is supposed to cause
replication to fail (but not to assert, of course). Because if understand
correctly, the queries in different domains are not guaranteed to be
independent, so it is not necessarily valid to put them in different domains
and replicate them in parallel (correct me if I am wrong). This is not a
critique of the test case, which clearly was efficient in finding numerous
bugs, just a note of something to be aware of.