[MDEV-33316] rpl_change_master_demote binlog race condition Created: 2024-01-25  Updated: 2024-01-25

Status: In Progress
Project: MariaDB Server
Component/s: Replication, Tests
Affects Version/s: 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Brandon Nesterenko Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-29517 rpl.rpl_change_master_demote sporadic... Closed

 Description   

In between test case 3 and test case 4 of rpl_change_master_demote.test, ER_GTID_POSITION_NOT_FOUND_IN_BINLOG2 is sent to the slave because of a race where the slave requests to start at a GTID not in the master's binlogs, and the master binlogs a transaction. This puts the master gtid_binlog_position ahead of the requested slave_connect_state.

The resulting failure looks like:

rpl.rpl_change_master_demote 'mix'       w1 [ fail ]
        Test ended at 2024-01-24 10:39:40
 
CURRENT_TEST: rpl.rpl_change_master_demote
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/buildbot/buildbot/build/mariadb-10.11.7/mysql-test/suite/rpl/t/rpl_change_master_demote.test at line 149:
At line 48: Failed to sync with master
 
The result from queries just before the failure was:
< snip >
#  * True primary is back to connection 'master'
#  * True replica is back to connection 'slave'
##############################################
connection master;
connection slave;
CHANGE MASTER TO master_host='127.0.0.1', master_port=MASTER_PORT, master_user='root', master_use_gtid=slave_pos, master_demote_to_slave=1;
#
# Test Case 4: If gtid_slave_pos and gtid_binlog_pos are equivalent,
# MASTER_DEMOTE_TO_SLAVE=1 will not change gtid_slave_pos.
#
connection master;
# update gtid_binlog_pos and demote it (we have proven this works)
INSERT INTO t1 VALUES (3);
# Update to account for statements to verify replication in include file
CHANGE MASTER TO master_host='127.0.0.1', master_port=SLAVE_PORT, master_user='root', master_use_gtid=slave_pos, master_demote_to_slave=1;
RESET SLAVE ALL;
include/save_master_gtid.inc
connection slave;
include/sync_with_master_gtid.inc
Timeout in master_gtid_wait('0-1-10', 120), current slave GTID position is: 0-2-9.
 
More results from queries before failure can be found in /dev/shm/var/1/log/rpl_change_master_demote.log
 
 - saving '/dev/shm/var/1/log/rpl.rpl_change_master_demote-mix/' to '/dev/shm/var/log/rpl.rpl_change_master_demote-mix/'
 
Retrying test rpl.rpl_change_master_demote, attempt(2/2)...


Generated at Thu Feb 08 10:38:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.