Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8945

Avoid overloading the master NIC on restarting IO_THREAD on lagging slave.

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Replication
    • None

    Description

      When GTID slave negotiation is enabled, the relay logs are wiped on starting the IO_THREAD. This might not be a big issue in most cases, but it is very bad on a lagging slave. I recently ran "STOP SLAVE; START SLAVE UNTIL ...;" on a GTID enabled slave, and it saturated the network interface of the master for more than one hour (this slave was lagging by ~4 days and had more than 250 GB of unprocessed relay logs).

      I could have been more careful only restarting the SQL_THREAD (not the IO_THREAD), but there are still situations where restarting the IO_THREAD cannot be avoided (restarting MariaDB as an example).

      If would be much better to avoid re-downloading binary log on starting the IO_THREAD as much of the relays logs are good on disk.

      Some more details in the following:
      http://jfg-mysql.blogspot.nl/2015/10/bad-commands-with-mariadb-gtids-2.html

      Thanks,

      JFG

      Attachments

        Issue Links

          Activity

            See also MDEV-4698

            elenst Elena Stepanova added a comment - See also MDEV-4698
            danblack Daniel Black added a comment -

            GTID indexing (MDEV-4991) is probably the first step solving this.

            danblack Daniel Black added a comment - GTID indexing ( MDEV-4991 ) is probably the first step solving this.

            I am not sure to understand how GTID indexing would solve the problem.
            My understanding of GTID indexing goal is to reduce Disk IO load on master when a slave connect.
            This MDEV is about avoiding maxing out the master NIC when a lagging slave re-connects to the master.
            In my opinion, the best way to to that is to use the already downloaded relay logs on the slave (do not wipe them).

            jgagne Jean-François Gagné added a comment - I am not sure to understand how GTID indexing would solve the problem. My understanding of GTID indexing goal is to reduce Disk IO load on master when a slave connect. This MDEV is about avoiding maxing out the master NIC when a lagging slave re-connects to the master. In my opinion, the best way to to that is to use the already downloaded relay logs on the slave (do not wipe them).
            danblack Daniel Black added a comment -

            quite right, gtid indexing only helps this problems this scenario for an offset of the master's first binlog to the extent it is already proceeded on the slave.

            danblack Daniel Black added a comment - quite right, gtid indexing only helps this problems this scenario for an offset of the master's first binlog to the extent it is already proceeded on the slave.
            danblack Daniel Black added a comment -

            fyi - stumbled upon https://github.com/percona/percona-server/pull/240 - haven't looked at code

            danblack Daniel Black added a comment - fyi - stumbled upon https://github.com/percona/percona-server/pull/240 - haven't looked at code

            People

              Unassigned Unassigned
              jgagne Jean-François Gagné
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.