Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.0.14
-
CentOS Linux release 7.0.1406 (Core) 3.10.0-123.6.3.el7.x86_64
Description
GTID slave, when switched with CHANGE MASTER TO another master which is behind the slave, does not fail or output any error message.
The slave fails with Fatal error 1236 when the master manages to sync with the original master.
If executed START SLAVE after that error, the slave starts OK, without errors.
Slave Error log:
=================================================
|
141020 15:04:12 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='192.168.122.103', master_port='3306', master_log_file='massimo-8012-master-bin.000016', master_log_pos='1748'. New state master_host='192.168.122.240', master_port='3306', master_log_file='', master_log_pos='4'.
|
141020 15:04:12 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
|
141020 15:04:12 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './massimo-8012-slave-relay-bin.000001' position: 4; GTID position '1-1-40,0-1-335'
|
141020 15:04:12 [Note] Slave I/O thread: connected to master 'repl@192.168.122.240:3306',replication starts at GTID position '1-1-40,0-1-335'
|
141020 15:09:31 [ERROR] Error reading packet from server: Error: connecting slave requested to start from GTID 0-1-335, which is not in the master's binlog ( server_errno=1236)
|
141020 15:09:31 [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Error: connecting slave requested to start from GTID 0-1-335, which is not in the master's binlog', Internal MariaDB error code: 1236
|
141020 15:09:31 [Note] Slave I/O thread exiting, read up to log 'massimo-8012-slave-bin.000001', position 325; GTID position 1-1-40,0-1-335
|
141020 15:13:53 [Note] Slave I/O thread: connected to master 'repl@192.168.122.240:3306',replication starts at GTID position '1-1-40,0-1-335'
|
|
=================================================
|
I have tcpdump on port 3306 if needed from both first and second master instances ..
Hi Ivan,
Can you clarify the workflow, there are too many unknowns in here?
What was the full replication topology before the master change? Was it the only change in the topology?
You have two GTIDs in the slave pos, 1-1-40,0-1-335 – where does each chain (for domains 0 and 1) come from? If both came from the old master, then how did it happen, were the chains interleaved, or was it domain 1 at first and domain 0 later, or vice versa, or...?
What are domain IDs and server IDs on the involved servers?
How do you know that the error happened exactly after the new master caught up with the old master?
If you could provide full binary logs from all three servers, it would be perfect.