Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
I run CHANGE MASTER ... master_gtid_pos=auto;
start slave;
execute some statements on master;
wait till slave catches up with the master;
stop slave (both threads or IO only);
start slave again
=> the slave attempts to re-execute previous statements.
revision-id: knielsen@knielsen-hq.org-20130311151655-yc1i3z72v6c00pfz
|
revno: 3468
|
branch-nick: 10.0-mdev26
|
Test case:
--source include/master-slave.inc
|
|
--connection slave
|
STOP SLAVE;
|
--source include/wait_for_slave_to_stop.inc
|
RESET SLAVE ALL; |
|
--connection master
|
RESET MASTER;
|
|
--connection slave
|
eval CHANGE MASTER TO master_host='127.0.0.1', master_port=$MASTER_MYPORT, master_user='root', master_gtid_pos=auto; |
START SLAVE;
|
--source include/wait_for_slave_to_start.inc
|
|
--connection master
|
CREATE TABLE t1 (i INT); |
INSERT INTO t1 VALUES (1); |
|
--sync_slave_with_master
|
STOP SLAVE IO_THREAD;
|
--source include/wait_for_slave_io_to_stop.inc
|
START SLAVE IO_THREAD;
|
--source include/wait_for_slave_io_to_start.inc
|
--sync_with_master
|
Result:
=== SHOW SLAVE STATUS ===
|
---- 1. ----
|
Slave_IO_State Waiting for master to send event
|
Master_Host 127.0.0.1
|
Master_User root
|
Master_Port 16000
|
Connect_Retry 1
|
Master_Log_File master-bin.000001
|
Read_Master_Log_Pos 311
|
Relay_Log_File slave-relay-bin.000002
|
Relay_Log_Pos 599
|
Relay_Master_Log_File master-bin.000001
|
Slave_IO_Running Yes
|
Slave_SQL_Running No
|
Replicate_Do_DB
|
Replicate_Ignore_DB
|
Replicate_Do_Table
|
Replicate_Ignore_Table
|
Replicate_Wild_Do_Table
|
Replicate_Wild_Ignore_Table
|
Last_Errno 1050
|
Last_Error Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE t1 (i INT)'
|
Skip_Counter 0
|
Exec_Master_Log_Pos 311
|
Relay_Log_Space 1863
|
Until_Condition None
|
Until_Log_File
|
Until_Log_Pos 0
|
Master_SSL_Allowed No
|
Master_SSL_CA_File
|
Master_SSL_CA_Path
|
Master_SSL_Cert
|
Master_SSL_Cipher
|
Master_SSL_Key
|
Seconds_Behind_Master
|
Master_SSL_Verify_Server_Cert No
|
Last_IO_Errno 0
|
Last_IO_Error
|
Last_SQL_Errno 1050
|
Last_SQL_Error Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'CREATE TABLE t1 (i INT)'
|
Replicate_Ignore_Server_Ids
|
Master_Server_Id 1
|
Using_Gtid 1
|
=========================
|
Attachments
Issue Links
- relates to
-
MDEV-26 Global transaction ID
-
- Closed
-
Right, this is an important issue, thanks for catching.
The underlying issue here is that when IO thread connects (or re-connects), it needs to request position
by GTID, which is related to what the SQL thread has last executed, not to what the IO thread last fetched.
So there are several possibitilities for fetching again something that the SQL thread is in the middle of executing, or similar races. My current code does not handle this at all. It can be especially tricky as the SQL thread may be running while the IO thread loses the connection to the master and needs to automatically reconnect.
I think I need to make it so that the SQL thread remembers what it executed, so that it can skip stuff that gets duplicate-fetched into relay logs. This is not too hard, it only needs to be done in-memory. Whenever slave server is restarted or CHANGE MASTER is executed, we can just drop existing relay logs (which we need to do anyway).
Still, needs to be done carefully to handle all cases properly.