[MDEV-22605] SHOW SLAVE STATUS does not correctly reflect broken replication Created: 2020-05-17  Updated: 2020-05-26

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2.27
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: zhang Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: replication
Environment:

10.1.34 to 10.2.27 bin log pos multi-source replication



 Description   

MariaDB SHOW SLAVE STATUS does not correctly reflect broken replication

Here is the phenomenon:

I use multi-source-replication for some purpose, and I have found there is a connection stop replication data from the master node but without showing any error of `show slave status\G` or `show all slaves status\G`.

The weird thing is that not only `SLAVE_IO_RUNNING/SLAVE_SQL_RUNNING` shows `Yes`, but also `Exec_Master_Log_Pos/Read_Master_Log_Pos` is continuously increasing... So I went to check the relay log on the slave, I found the DDL which has been written to relay log were never executed on the slave node itself, and the server log does not reflect any error.

I have tested it serve times and I think I might found out how to reproduce it.. and I believe it is not the same as 21687 or 10703. Sorry if there are other issues mentioned about it and I may haven't found out.

The master node version I am using is 10.1.34 and the slave is 10.2.27.

1. create a master and slave node;
2. use `mysqldump` to copy the data and record master_log_position;
3. > `set @@default_master_connection='test-master'`;
4. > set global replication_do_db = 'xxx';
5. > change mastet to 'xxxxxx'...........
6. start slave 'test-master';

Everything is okay by now

7. add `test-master.replicate_db_db='xxx'` to my.cnf/mysqld.cnf;
8. systemctl restart mariadb.service

9. > show all slaves status\G

The slave replication is actually stopped but status is `Yes`, and `Exec_Master_Log_Pos` is still incrementing.

I think the key is the `connection_name`, I have tried `reset slave 'connection_name' all`, and recreate a master connection with a different name, and start slave at the position where it starts first(the data is actually not growing at the first place). It can actually fix the problem of the non-growth database of which the last connection_name stays. but if u do not use `reset slave xxx all` to clean all the status which left before and start a new one with same database to be replicated. Or if u just add connection_name.replicate_do_db='xxxxx' to your config file and restart the slave. It can reproduce the problem.


Generated at Thu Feb 08 09:16:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.