Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
The test case runs a number ($n) of INSERTs on master, then flushes logs, and tries to synchronize the slave with the master.
If it's executed with slave_parallel_threads > 0, we can see in the slave status that the exec master position is the same as on the master right away, although the number of inserted values differs.
After a while, the position is still the same, but the count increases (up to the expected value, eventually).
The test case shows the problem reliably for me with $n = 100 (the first count on the slave is 20 or so), but if it does not for you, please try to increase $n, I suppose it should increase the probability.
--source include/master-slave.inc
|
--source include/have_innodb.inc
|
--source include/have_binlog_format_mixed.inc
|
|
--enable_connect_log
|
|
--connection slave
|
--source include/stop_slave.inc
|
|
--connection master
|
--disable_warnings
|
DROP TABLE IF EXISTS t1; |
--enable_warnings
|
|
CREATE TABLE t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM; |
let $n = 100;
|
--disable_query_log
|
--echo
|
--echo # Running $n single inserts on master
|
--echo
|
while ($n)
|
{
|
INSERT INTO t1 VALUES(); |
dec $n; |
}
|
--enable_query_log
|
FLUSH LOGS;
|
|
SHOW MASTER STATUS;
|
--save_master_pos
|
|
--connection slave
|
--source include/start_slave.inc
|
|
--sync_with_master
|
|
--echo
|
--echo # The slave thinks it's synchronized, but the count is off
|
--echo
|
|
query_vertical SHOW SLAVE STATUS;
|
select count(*) from t1; |
|
--echo
|
--echo # After waiting a bit, the position is the same, but the count increased
|
--echo
|
sleep 2;
|
|
query_vertical SHOW SLAVE STATUS;
|
select count(*) from t1; |
|
connection master; |
DROP TABLE t1; |
sync_slave_with_master;
|
|
--disable_connect_log
|
|
--source include/rpl_end.inc
|
Output:
CREATE TABLE t1 (id INT NOT NULL AUTO_INCREMENT PRIMARY KEY) ENGINE=MyISAM;
|
|
# Running 100 single inserts on master
|
|
FLUSH LOGS;
|
SHOW MASTER STATUS;
|
File Position Binlog_Do_DB Binlog_Ignore_DB
|
master-bin.000002 367
|
connection slave;
|
include/start_slave.inc
|
connection slave;
|
|
# The slave thinks it's synchronized, but the count is off
|
|
SHOW SLAVE STATUS;
|
Slave_IO_State Waiting for master to send event
|
Master_Host 127.0.0.1
|
Master_User root
|
Master_Port 16000
|
Connect_Retry 1
|
Master_Log_File master-bin.000002
|
Read_Master_Log_Pos 367
|
Relay_Log_File slave-relay-bin.000005
|
Relay_Log_Pos 7051
|
Relay_Master_Log_File master-bin.000002
|
Slave_IO_Running Yes
|
Slave_SQL_Running Yes
|
Replicate_Do_DB
|
Replicate_Ignore_DB
|
Replicate_Do_Table
|
Replicate_Ignore_Table
|
Replicate_Wild_Do_Table
|
Replicate_Wild_Ignore_Table
|
Last_Errno 0
|
Last_Error
|
Skip_Counter 0
|
Exec_Master_Log_Pos 367
|
Relay_Log_Space 996
|
Until_Condition None
|
Until_Log_File
|
Until_Log_Pos 0
|
Master_SSL_Allowed No
|
Master_SSL_CA_File
|
Master_SSL_CA_Path
|
Master_SSL_Cert
|
Master_SSL_Cipher
|
Master_SSL_Key
|
Seconds_Behind_Master 0
|
Master_SSL_Verify_Server_Cert No
|
Last_IO_Errno 0
|
Last_IO_Error
|
Last_SQL_Errno 0
|
Last_SQL_Error
|
Replicate_Ignore_Server_Ids
|
Master_Server_Id 1
|
Using_Gtid No
|
select count(*) from t1;
|
count(*)
|
29
|
|
# After waiting a bit, the position is the same, but the count increased
|
|
SHOW SLAVE STATUS;
|
Slave_IO_State Waiting for master to send event
|
Master_Host 127.0.0.1
|
Master_User root
|
Master_Port 16000
|
Connect_Retry 1
|
Master_Log_File master-bin.000002
|
Read_Master_Log_Pos 367
|
Relay_Log_File slave-relay-bin.000005
|
Relay_Log_Pos 22963
|
Relay_Master_Log_File master-bin.000002
|
Slave_IO_Running Yes
|
Slave_SQL_Running Yes
|
Replicate_Do_DB
|
Replicate_Ignore_DB
|
Replicate_Do_Table
|
Replicate_Ignore_Table
|
Replicate_Wild_Do_Table
|
Replicate_Wild_Ignore_Table
|
Last_Errno 0
|
Last_Error
|
Skip_Counter 0
|
Exec_Master_Log_Pos 367
|
Relay_Log_Space 996
|
Until_Condition None
|
Until_Log_File
|
Until_Log_Pos 0
|
Master_SSL_Allowed No
|
Master_SSL_CA_File
|
Master_SSL_CA_Path
|
Master_SSL_Cert
|
Master_SSL_Cipher
|
Master_SSL_Key
|
Seconds_Behind_Master 2
|
Master_SSL_Verify_Server_Cert No
|
Last_IO_Errno 0
|
Last_IO_Error
|
Last_SQL_Errno 0
|
Last_SQL_Error
|
Replicate_Ignore_Server_Ids
|
Master_Server_Id 1
|
Using_Gtid No
|
select count(*) from t1;
|
count(*)
|
100
|
connection master;
|
DROP TABLE t1;
|
connection slave;
|
include/rpl_end.inc
|
revision-id: knielsen@knielsen-hq.org-20131030065230-kp8dykgyeth6ma55
|
revno: 3690
|
branch-nick: 10.0-knielsen
|
BUILD/compile-pentium-debug-max-no-ndb
|
Attachments
Issue Links
- relates to
-
MDEV-4506 MWL#184: Parallel replication of group-committed transactions
-
- Closed
-
In parallel replication, there are two kinds of events which are
executed in different ways.
Normal events that are part of event groups/transactions are executed
asynchroneously by being queued for a worker thread.
Other events like format description and rotate and such are executed
directly in the driver SQL thread.
If the direct execution of the other events were to update the old-style
position, then the position gets updated too far ahead, before the normal
events that have been queued for a worker thread have been executed.
So I now pushed a patch that adds some special cases to prevent such position
updates ahead of time, and instead queues dummy events for the worker threads,
so that they will at an appropriate time do the position updates instead.
With this patch, the count is the correct one, 100, in both cases in the
reported test case.