[MDEV-29517] rpl.rpl_change_master_demote sporadically fails in BB Created: 2022-09-12  Updated: 2024-01-25  Resolved: 2022-10-03

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: 10.10, 10.11
Fix Version/s: 10.10.2, 10.11.1

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Angelique Sklavounos (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: File mysqld.1.err.2     File mysqld.2.err.2    
Issue Links:
Relates
relates to MDEV-33316 rpl_change_master_demote binlog race ... In Progress

 Description   

https://buildbot.mariadb.org/#/builders/98/builds/9752

10.11 16c971875

rpl.rpl_change_master_demote 'mix'       w2 [ fail ]
        Test ended at 2022-09-12 06:32:13
 
CURRENT_TEST: rpl.rpl_change_master_demote
mysqltest: In included file "./include/wait_for_slave_param.inc": 
included from ./include/wait_for_slave_io_to_start.inc at line 40:
included from ./include/wait_for_slave_to_start.inc at line 26:
included from ./include/start_slave.inc at line 35:
included from /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/rpl/include/rpl_change_master_demote.inc at line 44:
included from /home/buildbot/aarch64-ubuntu-2004/build/mysql-test/suite/rpl/t/rpl_change_master_demote.test at line 147:
At line 119: Timeout in include/wait_for_slave_param.inc
 
The result from queries just before the failure was:
< snip >
master-bin.000001	256	Gtid_list	1	285	[]
master-bin.000001	285	Binlog_checkpoint	1	329	master-bin.000001
master-bin.000001	329	Gtid	1	371	GTID 0-1-1
master-bin.000001	371	Query	1	470	use `test`; CREATE TABLE t1 (a int)
master-bin.000001	470	Gtid	1	512	BEGIN GTID 0-1-2
master-bin.000001	512	Query	1	604	use `test`; INSERT INTO t1 VALUES (1)
master-bin.000001	604	Query	1	677	COMMIT
master-bin.000001	677	Gtid	1	719	BEGIN GTID 0-1-6
master-bin.000001	719	Query	1	811	use `test`; INSERT INTO t1 VALUES (2)
master-bin.000001	811	Query	1	884	COMMIT
master-bin.000001	884	Gtid	1	926	BEGIN GTID 0-1-10
master-bin.000001	926	Query	1	1018	use `test`; INSERT INTO t1 VALUES (3)
master-bin.000001	1018	Query	1	1091	COMMIT
 
**** SHOW RELAYLOG EVENTS on server_1 ****
relaylog_name = 'master-relay-bin.000001'
SHOW RELAYLOG EVENTS IN 'master-relay-bin.000001';
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-relay-bin.000001	4	Format_desc	1	256	Server ver: 10.11.0-MariaDB-log, Binlog ver: 4
connection master;

This happens during Test Case 4, when the primary has changed to become the new replica and it is attempted to start it.

The primary/server 1 error log shows:

2022-09-12  6:26:49 9 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16081', master_log_file='', master_log_pos='4'.
2022-09-12  6:26:49 9 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2022-09-12  6:26:49 23 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16081' in log '' at position 4
2022-09-12  6:26:49 23 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16081',replication starts at GTID position '0-1-10'
2022-09-12  6:26:49 23 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-10, which is not in the master's binlog', Internal MariaDB error code: 1236
2022-09-12  6:26:49 23 [Note] Slave I/O thread exiting, read up to log 'FIRST', position 4; GTID position 0-1-10, master 127.0.0.1:16081
2022-09-12  6:26:49 24 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './master-relay-bin.000001' position: 4; GTID position '0-1-10'
2022-09-12  6:32:12 0 [Note] /home/buildbot/aarch64-ubuntu-2004/build/sql/mariadbd (initiated by: unknown): Normal shutdown
2022-09-12  6:32:12 24 [Note] Error reading relay log event: slave SQL thread was killed
2022-09-12  6:32:12 24 [Note] Slave SQL thread exiting, replication stopped in log 'FIRST' at position 4; GTID position '0-1-10', master: 127.0.0.1:16081
2022-09-12  6:32:12 0 [Note] /home/buildbot/aarch64-ubuntu-2004/build/sql/mariadbd: Shutdown complete

The replica/server 2 error log shows:

2022-09-12  6:26:49 24 [Note] Start binlog_dump to slave_server(1), pos(, 4), using_gtid(1), gtid('0-1-10')
2022-09-12  6:32:12 22 [Note] Slave: received end packet from server, apparent master shutdown: 
2022-09-12  6:32:12 22 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'master-bin.000001' at position 1091; GTID position '0-1-10'
2022-09-12  6:32:12 22 [ERROR] Slave I/O: error reconnecting to master 'root@127.0.0.1:16080' - retry-time: 1  maximum-retries: 10  message: Can't connect to server on '127.0.0.1' (111 "Connection refused"), Internal MariaDB error code: 2003
2022-09-12  6:32:12 0 [Note] /home/buildbot/aarch64-ubuntu-2004/build/sql/mariadbd (initiated by: unknown): Normal shutdown
2022-09-12  6:32:12 23 [Note] Error reading relay log event: slave SQL thread was killed
2022-09-12  6:32:12 23 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 1091; GTID position '0-1-10', master: 127.0.0.1:16080
2022-09-12  6:32:12 22 [Note] Slave I/O thread killed during or after a reconnect done to recover from failed read
2022-09-12  6:32:12 22 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 1091; GTID position 0-1-10, master 127.0.0.1:16080
2022-09-12  6:32:12 0 [Note] InnoDB: FTS optimize thread exiting.
2022-09-12  6:32:12 0 [Note] InnoDB: Starting shutdown...
2022-09-12  6:32:12 0 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/var_auto_caLU/2/mysqld.2/data/ib_buffer_pool
2022-09-12  6:32:12 0 [Note] InnoDB: Restricted to 126 pages due to innodb_buf_pool_dump_pct=25
2022-09-12  6:32:12 0 [Note] InnoDB: Buffer pool(s) dump completed at 220912  6:32:12
2022-09-12  6:32:13 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2022-09-12  6:32:13 0 [Note] InnoDB: Shutdown completed; log sequence number 48342; transaction id 17
2022-09-12  6:32:13 0 [Note] /home/buildbot/aarch64-ubuntu-2004/build/sql/mariadbd: Shutdown complete

Error logs attached.


Generated at Thu Feb 08 10:09:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.