[MDEV-15892] DO_DOMAIN_IDS + master_delay + multi-master causes slaves to crash Created: 2018-04-17  Updated: 2020-11-02  Resolved: 2020-11-02

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2.10, 10.2.12
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Devin Yu Assignee: Sachin Setiya (Inactive)
Resolution: Incomplete Votes: 0
Labels: need_feedback, replication, replication-filter
Environment:

3 galera nodes:
Mysql1
Mysql2
Mysql3

1 delay slave node:
Mysql4


Attachments: File my_galera1.cnf     File my_galera2.cnf     File my_galera3.cnf     File my_slave.cnf    

 Description   

1.Setup replication on slave as follows, it works fine.

a) test 1
CHANGE MASTER 'Mysql1' TO master_host='Mysql1',master_user='replication',master_password='replication',master_use_gtid=slave_pos;
 
b) test 2
CHANGE MASTER 'Mysql1' TO master_host='Mysql1',master_user='replication',master_password='replication',master_use_gtid=slave_pos,DO_DOMAIN_IDS=(10);
 
c) test 3
CHANGE MASTER 'Mysql1' TO master_host='Mysql1',master_user='replication',master_password='replication',master_use_gtid=slave_pos,master_delay=10;

2.Setup replication (DO_DOMAIN_IDS + master_delay) on slave, there is a problem with non-DO_DOMAIN_IDS when appling on slave.

#Mysql4 (Slave) , test 4
CHANGE MASTER 'Mysql1' TO master_host='Mysql1',master_user='replication',master_password='replication',master_use_gtid=slave_pos,DO_DOMAIN_IDS=(10),master_delay=10;
 
#Mysql1 (Master)
create table test.a(a int) engine=innodb;
create table test.b(a int) engine=myisam;
 
watch -n 1 -e "mysql -uroot test -e \"insert into a select 1;insert into b select 1;select count(*) as a from a;select count(*) as b from b\""
 
#We can see that the GITDs of both domain 1 and domain 10 are growing
$ mysql -uroot -e "show global variables like 'gtid_current_pos'"; sleep 5; mysql -uroot -e "show global variables like 'gtid_current_pos'"
+------------------+------------------------+
| Variable_name    | Value                  |
+------------------+------------------------+
| gtid_current_pos | *1-1-223546*,*10-1-233617* |
+------------------+------------------------+
+------------------+------------------------+
| Variable_name    | Value                  |
+------------------+------------------------+
| gtid_current_pos | *1-1-223551*,*10-1-233622* |
+------------------+------------------------+
 
#Mysql4 (Slave)
#We can see that the GITD of domain 10 is growing normal, but GTID of domain 1 has been stopped.
#If expire_logs_days is set on the master, slave will fail to restart after expire_logs_days days.
$ mysql -uroot -e "show all slaves status\G" | grep -i gtid; sleep 5; mysql -uroot -e "show all slaves status\G" | grep -i gtid
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: *1-1-223742*,*10-1-233813*,2-2-105
               Gtid_Slave_Pos: *1-1-223312*,2-2-105,*10-1-233803*
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: *1-1-223747*,*10-1-233818*,2-2-105
               Gtid_Slave_Pos: *1-1-223312*,2-2-105,*10-1-233808*

3.Setup replication (DO_DOMAIN_IDS + master_delay + multi-master) on slave, will cause the query to slow down, such as, "show all slaves status; show global variables;" . And a few days later, slave has been killed by OOM killer in my UAT env.

#Mysql4 (Slave) , test 5
CHANGE MASTER 'Mysql1' TO master_host='Mysql1',master_user='replication',master_password='replication',master_use_gtid=slave_pos,DO_DOMAIN_IDS=(10),master_delay=10;
CHANGE MASTER 'Mysql2' TO master_host='Mysql2',master_user='replication',master_password='replication',master_use_gtid=slave_pos,DO_DOMAIN_IDS=(10),master_delay=10;
CHANGE MASTER 'Mysql3' TO master_host='Mysql3',master_user='replication',master_password='replication',master_use_gtid=slave_pos,DO_DOMAIN_IDS=(10),master_delay=10;
 
#Mysql1 (Master)
create table test.a(a int) engine=innodb;
create table test.b(a int) engine=myisam;
 
watch -n 1 -e "mysql -uroot test -e \"insert into a select 1;insert into b select 1;select count(*) as a from a;select count(*) as b from b\""
 
#Mysql4 (Slave)
#"show all slaves status\G" is very very slow.
> show all slaves status\G
...
3 rows in set (20.21 sec)
 
> show global variables;
...
632 rows in set (16.52 sec)



 Comments   
Comment by Andrei Elkin [ 2019-12-02 ]

920895156@qq.com If I understood it correct Mysql4 slave is configured (see 1.b with DO_DOMAIN_IDS=(10).
I can't understand then, sorry, why it is 'a problem'

2.Setup replication (DO_DOMAIN_IDS + master_delay) on slave, there is a problem with non-DO_DOMAIN_IDS when appling on slave.

that another domain is not applied? https://mariadb.com/kb/en/library/change-master-to/#do_domain_ids describes

DO_DOMAIN_IDS
MariaDB starting with 10.1.2

The DO_DOMAIN_IDS option for CHANGE MASTER was first added in MariaDB 10.1.2.

The DO_DOMAIN_IDS option for CHANGE MASTER can be used to configure a replication slave to only apply binary log events if the transaction's GTID is in a specific gtid_domain_id value. Filtered binary log events will not get logged to the slave’s relay log, and they will not be applied by the slave.

Comment by Andrei Elkin [ 2020-11-02 ]

Maybe the following

#If expire_logs_days is set on the master, slave will fail to restart after expire_logs_days days.

clarifies what was meant in 'a problem with non-DO_DOMAIN_IDS' which would be clearly not a bug but
a gtid misunderstanding on behalf of the reporter.

The user apparently needed to clear the slave gtid state off other than the 10th domain before to reconnect.

Generated at Thu Feb 08 08:24:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.