[MDEV-30232] rpl.rpl_gtid_crash fails sporadically in BB with Timeout wait for SQL thread to catch up with IO thread Created: 2022-12-14  Updated: 2023-11-28

Status: Stalled
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.4, 10.5, 10.6, 10.8, 10.9, 10.10, 10.11, 11.0
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Angelique Sklavounos (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-30231 rpl.rpl_gtid_crash fails sporadically... Closed

 Description   

https://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/34174

10.8 e8a2a70cf

rpl.rpl_gtid_crash 'innodb,row'          w4 [ fail ]
        Test ended at 2022-03-08 06:14:34
 
CURRENT_TEST: rpl.rpl_gtid_crash
mysqltest: At line 367: Timeout wait for SQL thread to catch up with IO thread
 
The result from queries just before the failure was:
< snip >
call mtr.add_suppression("Unexpected change of master binlog file name in the middle of GTID");
set sql_log_bin= 1;
connection server_1;
SET GLOBAL debug_dbug="+d,inject_error_writing_xid";
BEGIN;
INSERT INTO t1 VALUES (11);
COMMIT;
ERROR HY000: Error writing file 'master-bin' (errno: 28 "No space left on device")
SET GLOBAL debug_dbug="+d,crash_dispatch_command_before";
COMMIT;
Got one of the listed errors
SELECT @@GLOBAL.server_id;
@@GLOBAL.server_id
3
SELECT * from t1 WHERE a > 10 ORDER BY a;
a
gtid_check
Binlog pos ok
# Wait 30 seconds for SQL thread to catch up with IO thread
connection server_2;

Also seen to have failed at the second wait for the SQL thread:

10.6 c4ce012e4

rpl.rpl_gtid_crash 'innodb,row'          w4 [ fail ]
        Test ended at 2022-11-07 19:05:14
 
CURRENT_TEST: rpl.rpl_gtid_crash
mysqltest: At line 464: Timeout wait for SQL thread to catch up with IO thread
 
The result from queries just before the failure was:
< snip >
BEGIN;
INSERT INTO t1 VALUES (21);
COMMIT;
ERROR HY000: Error writing file 'master-bin' (errno: 28 "No space left on device")
SET GLOBAL debug_dbug="+d,crash_dispatch_command_before";
COMMIT;
Got one of the listed errors
SELECT @@GLOBAL.server_id;
@@GLOBAL.server_id
1
SELECT * from t1 WHERE a > 10 ORDER BY a;
a
13
14
gtid_check
Binlog pos ok
gtid_check
Current pos ok
# Wait 30 seconds for SQL thread to catch up with IO thread
connection server_2;



 Comments   
Comment by Andrei Elkin [ 2023-11-22 ]

A warning needs suppression:

https://buildbot.mariadb.org/#/builders/572/builds/4473/steps/8/logs/stdio

rpl.rpl_gtid_crash 'innodb,row' w14 [ fail ] Found warnings/errors in server log file!
Test ended at 2023-11-22 14:23:59
line
2023-11-22 14:23:50 5 [Warning] Slave I/O: SET @master_heartbeat_period to master failed with error: Lost connection to server during query, Internal MariaDB error code: 2013
^ Found warnings in /home/buildbot/amd64-debian-11-msan/build/mysql-test/var/14/log/mysqld.2.err

According to the slave side error log it may happened.

Generated at Thu Feb 08 10:14:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.