Please note that the test case requires semisync plugins, so they should be built and placed in the plugin folder.
On the same reason the test case is not applicable to versions below 5.5.
The test case works pretty reliably for me, but a couple of times (out of dozens) it didn't cause the assertion, so I needed to re-run it.
The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?
Sergei Golubchik
added a comment - - edited The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?
I couldn't reproduce the failure on 10.0 tree so far (still trying).
But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.
Elena Stepanova
added a comment - - edited I couldn't reproduce the failure on 10.0 tree so far (still trying).
But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.
------------------------------------------------------------
revno: 3427.1.194
revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios
timestamp: Wed 2013-04-24 13:05:40 +0200
message:
Add missing check for thd->killed in mysql_binlog_send().
The slave dump thread running on the master only checked thd->killed whenever
it reached the end of a binlog file, not between events. This could
unnecessarily delay server shutdown.
This was found by code inspection while tracking down some occasional "forcing
close of thread..." errors in Buildbot. Hopefully this will fix the failures,
but the fix is correct in any case.
Also increase the wait during server shutdown, 2 seconds is a bit tight in
case of heavy I/O stall, and it seems better to delay shutdown a bit than
force-kill threads unnecessarily.
Also fix some races in test cases that restart the mysqld server. The .expect
file should be changed with --append_file, --remove_file + --write_file
creates a short window where mysqld can error out due to .expect file missing.
------------------------------------------------------------
for (int i= 0; (volatile int32) &thread_count && i < 100; i++)
+ /*
+ Give threads time to die.
+
+ In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
+ as 2 seconds, depending on thread scheduling.
+
+ From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
+ that on a server with heavy I/O load, it is quite possible for eg. an
+ fsync() of the binlog or whatever to cause something like LOCK_log to be
+ held for more than 2 seconds. We do not want to force kill threads in
+ such cases, if it can be avoided. Note that normally, the wait will be
+ much smaller than even 2 seconds, this is only a safety fallback against
+ stuck threads so server shutdown is not held up forever.
+ */
+ for (int i= 0; (volatile int32) &thread_count && i < 1000; i++)
my_sleep(20000);
/*
If I revert the change on the current 10.0 tree, the assertion failure starts happening again.
The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all.
I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb.
To reproduce,
cd mdev201/mysql-test
perl ./mtr mdev201 -mysqld=plugin-rpl-semi-sync-master --mysqld=-plugin-rpl-semi-sync-slave
Elena Stepanova
added a comment - I found when it stopped happening on 10.0 tree. The following revision did it:
------------------------------------------------------------
revno: 3748 [merge]
revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq
timestamp: Mon 2013-04-29 12:03:54 +0200
message:
Merge 10.0-base -> 10.0
------------------------------------------------------------
More particularly, this one:
------------------------------------------------------------
revno: 3427.1.194
revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios
timestamp: Wed 2013-04-24 13:05:40 +0200
message:
Add missing check for thd->killed in mysql_binlog_send().
The slave dump thread running on the master only checked thd->killed whenever
it reached the end of a binlog file, not between events. This could
unnecessarily delay server shutdown.
This was found by code inspection while tracking down some occasional "forcing
close of thread..." errors in Buildbot. Hopefully this will fix the failures,
but the fix is correct in any case.
Also increase the wait during server shutdown, 2 seconds is a bit tight in
case of heavy I/O stall, and it seems better to delay shutdown a bit than
force-kill threads unnecessarily.
Also fix some races in test cases that restart the mysqld server. The .expect
file should be changed with --append_file, --remove_file + --write_file
creates a short window where mysqld can error out due to .expect file missing.
------------------------------------------------------------
Even more particularly, this change:
=== modified file 'sql/mysqld.cc'
— sql/mysqld.cc 2013-04-16 11:43:28 +0000
+++ sql/mysqld.cc 2013-04-24 11:05:40 +0000
@@ -1494,8 +1494,21 @@
Events::deinit();
end_slave();
/* Give threads time to die. */
for (int i= 0; (volatile int32 ) &thread_count && i < 100; i++)
+ /*
+ Give threads time to die.
+
+ In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
+ as 2 seconds, depending on thread scheduling.
+
+ From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
+ that on a server with heavy I/O load, it is quite possible for eg. an
+ fsync() of the binlog or whatever to cause something like LOCK_log to be
+ held for more than 2 seconds. We do not want to force kill threads in
+ such cases, if it can be avoided. Note that normally, the wait will be
+ much smaller than even 2 seconds, this is only a safety fallback against
+ stuck threads so server shutdown is not held up forever.
+ */
+ for (int i= 0; (volatile int32 ) &thread_count && i < 1000; i++)
my_sleep(20000);
/*
If I revert the change on the current 10.0 tree, the assertion failure starts happening again.
The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all.
I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb.
To reproduce,
cd mdev201/mysql-test
perl ./mtr mdev201 - mysqld= plugin-rpl-semi-sync-master --mysqld= -plugin-rpl-semi-sync-slave
People
Sergei Golubchik
Elena Stepanova
Votes:
0Vote for this issue
Watchers:
3Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1238.1999998092651,"ttfb":190.80000019073486,"pageVisibility":"visible","entityId":11118,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"ba47a045-f04a-400e-9950-e55286c7fa7d","navigationType":0,"readyForUser":1368.8000001907349,"redirectCount":0,"resourceLoadedEnd":1300.6999998092651,"resourceLoadedStart":199.90000009536743,"resourceTiming":[{"duration":444.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":199.90000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":199.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":644.4000000953674,"responseStart":0,"secureConnectionStart":0},{"duration":444.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":200.19999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":200.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":644.6999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":453.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":200.40000009536743,"connectEnd":200.40000009536743,"connectStart":200.40000009536743,"domainLookupEnd":200.40000009536743,"domainLookupStart":200.40000009536743,"fetchStart":200.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":200.40000009536743,"responseEnd":653.8000001907349,"responseStart":653.8000001907349,"secureConnectionStart":200.40000009536743},{"duration":540.4000000953674,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":200.59999990463257,"connectEnd":200.59999990463257,"connectStart":200.59999990463257,"domainLookupEnd":200.59999990463257,"domainLookupStart":200.59999990463257,"fetchStart":200.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":200.59999990463257,"responseEnd":741,"responseStart":741,"secureConnectionStart":200.59999990463257},{"duration":544.1999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":200.80000019073486,"connectEnd":200.80000019073486,"connectStart":200.80000019073486,"domainLookupEnd":200.80000019073486,"domainLookupStart":200.80000019073486,"fetchStart":200.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":200.80000019073486,"responseEnd":745,"responseStart":745,"secureConnectionStart":200.80000019073486},{"duration":544.5999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":201,"connectEnd":201,"connectStart":201,"domainLookupEnd":201,"domainLookupStart":201,"fetchStart":201,"redirectEnd":0,"redirectStart":0,"requestStart":201,"responseEnd":745.5999999046326,"responseStart":745.5999999046326,"secureConnectionStart":201},{"duration":544.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":201.19999980926514,"connectEnd":201.19999980926514,"connectStart":201.19999980926514,"domainLookupEnd":201.19999980926514,"domainLookupStart":201.19999980926514,"fetchStart":201.19999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":201.19999980926514,"responseEnd":746,"responseStart":746,"secureConnectionStart":201.19999980926514},{"duration":619.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":201.30000019073486,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":201.30000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":820.8000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":545.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":201.59999990463257,"connectEnd":201.59999990463257,"connectStart":201.59999990463257,"domainLookupEnd":201.59999990463257,"domainLookupStart":201.59999990463257,"fetchStart":201.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":201.59999990463257,"responseEnd":747.4000000953674,"responseStart":747.4000000953674,"secureConnectionStart":201.59999990463257},{"duration":619.3000001907349,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":201.69999980926514,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":201.69999980926514,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":821,"responseStart":0,"secureConnectionStart":0},{"duration":546.2999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":201.80000019073486,"connectEnd":201.80000019073486,"connectStart":201.80000019073486,"domainLookupEnd":201.80000019073486,"domainLookupStart":201.80000019073486,"fetchStart":201.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":201.80000019073486,"responseEnd":748.0999999046326,"responseStart":748.0999999046326,"secureConnectionStart":201.80000019073486},{"duration":876.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":202.5,"connectEnd":202.5,"connectStart":202.5,"domainLookupEnd":202.5,"domainLookupStart":202.5,"fetchStart":202.5,"redirectEnd":0,"redirectStart":0,"requestStart":202.5,"responseEnd":1079,"responseStart":1079,"secureConnectionStart":202.5},{"duration":982.5999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":202.59999990463257,"connectEnd":202.59999990463257,"connectStart":202.59999990463257,"domainLookupEnd":202.59999990463257,"domainLookupStart":202.59999990463257,"fetchStart":202.59999990463257,"redirectEnd":0,"redirectStart":0,"requestStart":202.59999990463257,"responseEnd":1185.1999998092651,"responseStart":1185.1999998092651,"secureConnectionStart":202.59999990463257},{"duration":247,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":832.8000001907349,"connectEnd":832.8000001907349,"connectStart":832.8000001907349,"domainLookupEnd":832.8000001907349,"domainLookupStart":832.8000001907349,"fetchStart":832.8000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":832.8000001907349,"responseEnd":1079.8000001907349,"responseStart":1079.8000001907349,"secureConnectionStart":832.8000001907349},{"duration":133.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1167.0999999046326,"connectEnd":1167.0999999046326,"connectStart":1167.0999999046326,"domainLookupEnd":1167.0999999046326,"domainLookupStart":1167.0999999046326,"fetchStart":1167.0999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1167.0999999046326,"responseEnd":1300.6999998092651,"responseStart":1300.6999998092651,"secureConnectionStart":1167.0999999046326}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":17,"responseStart":191,"responseEnd":194,"domLoading":197,"domInteractive":1444,"domContentLoadedEventStart":1444,"domContentLoadedEventEnd":1504,"domComplete":2142,"loadEventStart":2142,"loadEventEnd":2142,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1411.3000001907349},{"name":"bigPipe.sidebar-id.end","time":1412.0999999046326},{"name":"bigPipe.activity-panel-pipe-id.start","time":1412.1999998092651},{"name":"bigPipe.activity-panel-pipe-id.end","time":1414.0999999046326},{"name":"activityTabFullyLoaded","time":1522.5999999046326}],"measures":[],"correlationId":"2816005a23f9e1","effectiveType":"4g","downlink":9.3,"rtt":0,"serverDuration":103,"dbReadsTimeInMs":8,"dbConnsTimeInMs":16,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
Modified test:
--source include/master-slave.inc
--source include/have_binlog_format_mixed.inc
--source include/have_semisync_plugin.inc
eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
--connection slave
eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE;
START SLAVE;
--connection master
CREATE TABLE t1 (
pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
a TEXT
) ENGINE = MyISAM;
--delimiter |
CREATE PROCEDURE pr ()
BEGIN
DECLARE done INT DEFAULT 10000;
wl: WHILE done DO
INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) );
SET done = done - 1;
END WHILE wl;
END|
--delimiter ;
--send
CALL pr();
--let $rpl_server_number= 1
--source include/rpl_stop_server.inc