[MDEV-8945] Avoid overloading the master NIC on restarting IO_THREAD on lagging slave. - Jira

Details

Type: Task
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Fix Version/s: None
Component/s: Replication
Labels:
None

Description

When GTID slave negotiation is enabled, the relay logs are wiped on starting the IO_THREAD. This might not be a big issue in most cases, but it is very bad on a lagging slave. I recently ran "STOP SLAVE; START SLAVE UNTIL ...;" on a GTID enabled slave, and it saturated the network interface of the master for more than one hour (this slave was lagging by ~4 days and had more than 250 GB of unprocessed relay logs).

I could have been more careful only restarting the SQL_THREAD (not the IO_THREAD), but there are still situations where restarting the IO_THREAD cannot be avoided (restarting MariaDB as an example).

If would be much better to avoid re-downloading binary log on starting the IO_THREAD as much of the relays logs are good on disk.

Some more details in the following:
http://jfg-mysql.blogspot.nl/2015/10/bad-commands-with-mariadb-gtids-2.html

Thanks,

JFG

Attachments

Issue Links

is duplicated by

MDEV-33645 Stop and Start slave reset the Master_info_file

Confirmed

relates to

MDEV-4698 With GTID replication, relay logs cannot be relied upon while purging binary logs on master

Open

MDEV-8959 change master to master_use_gtid= purges relay log and fetchs binlog from master

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2015-10-17 23:41

See also MDEV-4698

Elena Stepanova added a comment - 2015-10-17 23:41 See also MDEV-4698

Daniel Black added a comment - 2015-11-18 05:32

GTID indexing (~~MDEV-4991~~) is probably the first step solving this.

Daniel Black added a comment - 2015-11-18 05:32 GTID indexing ( MDEV-4991 ) is probably the first step solving this.

Jean-François Gagné added a comment - 2015-11-18 11:26

I am not sure to understand how GTID indexing would solve the problem.
My understanding of GTID indexing goal is to reduce Disk IO load on master when a slave connect.
This MDEV is about avoiding maxing out the master NIC when a lagging slave re-connects to the master.
In my opinion, the best way to to that is to use the already downloaded relay logs on the slave (do not wipe them).

Jean-François Gagné added a comment - 2015-11-18 11:26 I am not sure to understand how GTID indexing would solve the problem. My understanding of GTID indexing goal is to reduce Disk IO load on master when a slave connect. This MDEV is about avoiding maxing out the master NIC when a lagging slave re-connects to the master. In my opinion, the best way to to that is to use the already downloaded relay logs on the slave (do not wipe them).

Daniel Black added a comment - 2015-11-18 12:02

quite right, gtid indexing only helps this problems this scenario for an offset of the master's first binlog to the extent it is already proceeded on the slave.

Daniel Black added a comment - 2015-11-18 12:02 quite right, gtid indexing only helps this problems this scenario for an offset of the master's first binlog to the extent it is already proceeded on the slave.

Daniel Black added a comment - 2015-12-16 02:35

fyi - stumbled upon https://github.com/percona/percona-server/pull/240 - haven't looked at code

Daniel Black added a comment - 2015-12-16 02:35 fyi - stumbled upon https://github.com/percona/percona-server/pull/240 - haven't looked at code

People

Assignee:: Unassigned

Reporter:: Jean-François Gagné

Votes:: 4 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2015-10-15 08:10

Updated:: 2024-06-24 13:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server