[MDEV-8945] Avoid overloading the master NIC on restarting IO_THREAD on lagging slave. Created: 2015-10-15  Updated: 2015-12-16

Status: Open
Project: MariaDB Server
Component/s: Replication
Fix Version/s: None

Type: Task Priority: Major
Reporter: Jean-François Gagné Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: None

Issue Links:
Relates
relates to MDEV-4698 With GTID replication, relay logs can... Open
relates to MDEV-8959 change master to master_use_gtid= pu... Closed

 Description   

When GTID slave negotiation is enabled, the relay logs are wiped on starting the IO_THREAD. This might not be a big issue in most cases, but it is very bad on a lagging slave. I recently ran "STOP SLAVE; START SLAVE UNTIL ...;" on a GTID enabled slave, and it saturated the network interface of the master for more than one hour (this slave was lagging by ~4 days and had more than 250 GB of unprocessed relay logs).

I could have been more careful only restarting the SQL_THREAD (not the IO_THREAD), but there are still situations where restarting the IO_THREAD cannot be avoided (restarting MariaDB as an example).

If would be much better to avoid re-downloading binary log on starting the IO_THREAD as much of the relays logs are good on disk.

Some more details in the following:
http://jfg-mysql.blogspot.nl/2015/10/bad-commands-with-mariadb-gtids-2.html

Thanks,

JFG



 Comments   
Comment by Elena Stepanova [ 2015-10-17 ]

See also MDEV-4698

Comment by Daniel Black [ 2015-11-18 ]

GTID indexing (MDEV-4991) is probably the first step solving this.

Comment by Jean-François Gagné [ 2015-11-18 ]

I am not sure to understand how GTID indexing would solve the problem.
My understanding of GTID indexing goal is to reduce Disk IO load on master when a slave connect.
This MDEV is about avoiding maxing out the master NIC when a lagging slave re-connects to the master.
In my opinion, the best way to to that is to use the already downloaded relay logs on the slave (do not wipe them).

Comment by Daniel Black [ 2015-11-18 ]

quite right, gtid indexing only helps this problems this scenario for an offset of the master's first binlog to the extent it is already proceeded on the slave.

Comment by Daniel Black [ 2015-12-16 ]

fyi - stumbled upon https://github.com/percona/percona-server/pull/240 - haven't looked at code

Generated at Thu Feb 08 07:30:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.