[MXS-2422] Slave replication gap on binlog router failover to secondary master Created: 2019-04-04  Updated: 2019-09-04  Resolved: 2019-09-04

Status: Closed
Project: MariaDB MaxScale
Component/s: binlogrouter
Affects Version/s: 2.3.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Johan Wikman
Resolution: Won't Fix Votes: 0
Labels: None

Attachments: File maxscale.log.bz2    
Sprint: MXS-SPRINT-80, MXS-SPRINT-81

 Description   

My replication setup is as follows:

  • a two node Galera cluster as primary and secondary master
  • a Maxscale binlog router as intermediate slave
  • a real slave, replicating from the binlog router

The two Galera nodes are configured as:

[mysqld]  
  
server-id=1
 
binlog-format=ROW
log-bin=binlog
log-slave-updates=1
 
 
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_name=test_cluster
wsrep_cluster_address=gcomm://10.47.47.11:4567,10.47.47.12:4567
wsrep_sst_method=rsync
wsrep_sst_auth=root
wsrep_gtid_mode=ON
wsrep_gtid_domain_id=23
 
wsrep_node_address=10.47.47.11
wsrep_node_name=node-1

Second node only differs by wsrep_node_address and wsrep_node_name.

The binlog router is configured like this:

[Replication]
type=service
router=binlogrouter
user=maxscale
password=secret
server_id=4711
mariadb10_master_gtid=1
binlogdir=/var/lib/maxscale/binlogs/
 
[Replication-Listener]
type=listener
service=Replication
protocol=MariaDBClient
port=4711

The two nodes are set up as primary and secondary master like this:

SET @@global.gtid_slave_pos='0-1-4'; -- 
  
CHANGE MASTER      TO MASTER_HOST='node-1', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='secret', MASTER_USE_GTID=Slave_pos;
CHANGE MASTER ':2' TO MASTER_HOST='node-2', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='secret', MASTER_USE_GTID=Slave_pos;
 
START SLAVE;

I issue the following statements on node-1 and node-2:

node-1: CREATE TABLE t1(id int primary key);
node-1: INSERT INTO t1 VALUES (1);
node-2: INSERT INTO t1 VALUES (2);

The slave now also has table t1 with two rows in it.

Now I stop the mariadb process on node-1, and check on the maxscale node that binlog router replication has indeed switched over to node-2.

Next I insert one more row on node-2:

node-2: INSERT INTO t1 VALUES (3);

SHOW SLAVE STATUS on the slave shows that IO and SQL threads are still running, but row '3' does not show up.

I do a "STOP SLAVE; START SLAVE;" on the slave, but row '3' is still missing. Now I insert a row '4' on node-2. This row gets replicated again, but the test table on the slave only shows rows '1''2''4'. The row that was inserted after node-1 was shut down, but before restarting the replication on the slave, remains missing there.



 Comments   
Comment by Johan Wikman [ 2019-04-17 ]

I tried to repeat this as closely as possible, but did not get the described behaviour. Do you consistently experience that, or only occasionally?

I did notice another problem though. When I took the node acting as the primary master down (node1) and while it was down inserted data into the node acting as secondary master (node2), then when I took node1 up, the data was syncronized (select * from t1 returns the same results on both), but the binlog events generated in node2 did not end up in the binlog of node1.

That is, if MaxScale for some reason has switched back from the secondary master (i.e. node2) to the primary master when the real slave connects to it, then the real slave will not receive the events that were generated when node1 was down.

Comment by Johan Wikman [ 2019-09-04 ]

Current binlog router will be deprecated and replaced with new functionality in 2.5.

See also: https://galeracluster.com/library/documentation/galera-parameters.html#gmcast-segment

Generated at Thu Feb 08 04:14:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.