[MDEV-12278] Adding a 10.1.22 slave to 10.1.21 master freezing all dump threads Created: 2017-03-16  Updated: 2017-03-17  Resolved: 2017-03-17

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.22
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: VAROQUI Stephane Assignee: Unassigned
Resolution: Not a Bug Votes: 2
Labels: None


 Description   

The setup is master-slaves
1 Master->3 slaves all nodes in release 10.1.21
We setup a new slave in 10.1.22
via mysqldump --gtid --master-data --single transaction
as long as we do on the new slave 10.1.22
START SLAVE
All other 10.1.21 slaves IO threads stopped replicating from the Master
STOP SLAVE
Make all other slaves recovering

Relevant settings on replication is

plugin_load = "semisync_master.so;semisync_slave.so;sql_errlog.so"
rpl_semi_sync_master = ON
rpl_semi_sync_slave = ON
loose_rpl_semi_sync_master_enabled = ON
loose_rpl_semi_sync_slave_enabled = ON
slave_parallel_mode = optimistic
slave_parallel_threads = 4
binlog_format = ROW
binlog_checksum = 1
replicate_annotate_row_events = 1
log_slow_slave_statements = 1
log_slow_verbosity=query_plan,explain
log_warnings = 2
optimizer_switch='orderby_uses_equalities=on'
 
innodb_defragement=1
innodb_purge_threads = 8
innodb_print_all_deadlocks = 1
innodb_flush_neighbors = 1
innodb_stats_on_metadata = 0

I noted that the master have 1 month binlog 22G this could be investigated

We have tried few things on this new slave to understand what could be wrong here

Disable GTID vi master_use_gtid=no and give the old styme coordonate
Issue still visible

Disable all semisync plugin
Issue still visible

Disable l parameters

binlog_checksum = 1
replicate_annotate_row_events = 1
slave_parallel_mode = optimistic
slave_parallel_threads = 4

Issue still visible

More info may be provided via their support contract i will link to this jira when i get some feedback

Any suggestion on how to move forward is welcome



 Comments   
Comment by Pierre ANTOINE [ 2017-03-16 ]

If you need more intels like full confs or output from particulars commands, just ask me.

Pierre
CTO Kang

Comment by Daniel Black [ 2017-03-16 ]

From the slave at the time of lockup would be useful.

gdb --batch --eval-command="thread apply all bt" program $(pgrep -xn mysqld)

Comment by Elena Stepanova [ 2017-03-16 ]

pierre_kang,

Yes, could you please provide full configs? The one in the description cannot be right, it has a typo in it – innodb_defragement, InnoDB wouldn't even start (and if InnoDB is a default engine in your configuration, the server wouldn't start either).

Also, the description says "other 10.1.21 slaves IO threads stopped replicating from the Master" – what does it mean, exactly? What is the slave status on the stalled slaves, especially for the IO thread, and what does SHOW PROCESSLIST on master say for Binlog Dump threads?

Comment by VAROQUI Stephane [ 2017-03-16 ]

https://support.mariadb.com/view.php?id=13738

Comment by Pierre ANTOINE [ 2017-03-16 ]

Probleme solved:

Two nodes have same server-id
server-id = 110

Comment by VAROQUI Stephane [ 2017-03-16 ]

We found out that the new server was provisioned with the same server id as the existing slave we have been monitoring the replication .

So it looks like only one of those slaves having same server idea beeing notify for events .
Any to way to prevent this type of mistake for same named replication source having identical server_id ?

This issue can be close

Comment by Elena Stepanova [ 2017-03-17 ]

stephane@skysql.com, if you want to request the feature you mentioned above, please file a separate JIRA item (task), I don't think it makes sense to convert this one, it has too much information irrelevant to the request.

Generated at Thu Feb 08 07:56:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.