Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17516

Replication lag issue using parallel replication

Details

    Description

      Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after.

      This happen by design
      https://lists.launchpad.net/maria-developers/msg08958.html

      but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master.

      My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this

      |   30 | system user  |                      | tsce_unedic | Connect | 2211 | altering table                                                                 | OPTIMIZE TABLE `requetes` |    0.000 |
      

      And we can see wrong second behind master

              Seconds_Behind_Master: 0
                         Using_Gtid: Slave_Pos
                        Gtid_IO_Pos: 0-21-28557589
                      Parallel_Mode: conservative 
      

      but on his master

      gtid_current_pos       | 0-21-28570301 
      

      A possible solution would be to update Seconds_Behind_Master by injecting a fake event in start slave with the max timestamp of all events read by the leader thread and send to to the worker threads .

      To reproduce :

      --source include/have_innodb.inc
      --source include/have_binlog_format_mixed.inc
      --let $rpl_topology=1->2
      --source include/rpl_init.inc
       
      # Test various aspects of parallel replication.
       
      --connection server_1
      ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
       
      CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
      --save_master_pos
       
      --connection server_2
      --sync_with_master
      --source include/stop_slave_sql_thread.inc
      SET GLOBAL slave_parallel_threads=4;
       
      --connection server_2
      --sync_with_master
      --source include/stop_slave.inc
      SET GLOBAL slave_parallel_threads=1;
       
      --connection server_1
      --disable_warnings
      INSERT INTO t1 VALUES (1, SLEEP(100));
      --wait 100s
      INSERT INTO t1 VALUES (1, SLEEP(1));
       
      --connection server_2
      --source include/start_slave.inc
      --let $status_items= Seconds_Behind_Master
      --source include/show_slave_status.inc
      --sync_with_master
      --let $status_items= Seconds_Behind_Master
      --source include/show_slave_status.inc
      

      Attachments

        Issue Links

          Activity

            stephane@skysql.com VAROQUI Stephane created issue -
            stephane@skysql.com VAROQUI Stephane made changes -
            Field Original Value New Value
            Description Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after.


            This happen by design
            https://lists.launchpad.net/maria-developers/msg08958.html

            but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master.

            My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this

            {noformat}
            | 30 | system user | | tsce_unedic | Connect | 2211 | altering table | OPTIMIZE TABLE `requetes` | 0.000 |
            {noformat}
            And we can see wrong second behind master
            {noformat}
                    Seconds_Behind_Master: 0
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-21-28557589
                            Parallel_Mode: conservative
            {noformat}
            but on his master
            {noformat}
            gtid_current_pos | 0-21-28570301
            {noformat}
            A possible solution would be to update Seconds_Behind_Master by injecting a fake event ion start slave with the timestamp of the first event read by the leader thread

            To reproduce :
            {noformat}
            --source include/have_innodb.inc
            --source include/have_binlog_format_mixed.inc
            --let $rpl_topology=1->2
            --source include/rpl_init.inc

            # Test various aspects of parallel replication.

            --connection server_1
            ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;

            CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
            --save_master_pos

            --connection server_2
            --sync_with_master
            --source include/stop_slave_sql_thread.inc
            SET GLOBAL slave_parallel_threads=4;

            --connection server_2
            --sync_with_master
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=1;

            --connection server_1
            --disable_warnings
            INSERT INTO t1 VALUES (1, SLEEP(3600));

            --connection server_2
            --source include/start_slave.inc
            --let $status_items= Seconds_Behind_Master
            --source include/show_slave_status.inc
            --sync_with_master
            --let $status_items= Seconds_Behind_Master
            --source include/show_slave_status.inc
            {noformat}
            Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after.


            This happen by design
            https://lists.launchpad.net/maria-developers/msg08958.html

            but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master.

            My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this

            {noformat}
            | 30 | system user | | tsce_unedic | Connect | 2211 | altering table | OPTIMIZE TABLE `requetes` | 0.000 |
            {noformat}
            And we can see wrong second behind master
            {noformat}
                    Seconds_Behind_Master: 0
                               Using_Gtid: Slave_Pos
                              Gtid_IO_Pos: 0-21-28557589
                            Parallel_Mode: conservative
            {noformat}
            but on his master
            {noformat}
            gtid_current_pos | 0-21-28570301
            {noformat}

            A possible solution would be to update Seconds_Behind_Master by injecting a fake event in start slave with the max timestamp of all events read by the leader thread and send to to the worker threads .

            To reproduce :
            {noformat}
            --source include/have_innodb.inc
            --source include/have_binlog_format_mixed.inc
            --let $rpl_topology=1->2
            --source include/rpl_init.inc

            # Test various aspects of parallel replication.

            --connection server_1
            ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;

            CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
            --save_master_pos

            --connection server_2
            --sync_with_master
            --source include/stop_slave_sql_thread.inc
            SET GLOBAL slave_parallel_threads=4;

            --connection server_2
            --sync_with_master
            --source include/stop_slave.inc
            SET GLOBAL slave_parallel_threads=1;

            --connection server_1
            --disable_warnings
            INSERT INTO t1 VALUES (1, SLEEP(100));
            --wait 100s
            INSERT INTO t1 VALUES (1, SLEEP(1));

            --connection server_2
            --source include/start_slave.inc
            --let $status_items= Seconds_Behind_Master
            --source include/show_slave_status.inc
            --sync_with_master
            --let $status_items= Seconds_Behind_Master
            --source include/show_slave_status.inc
            {noformat}
            stephane@skysql.com VAROQUI Stephane made changes -
            elenst Elena Stepanova made changes -
            Assignee Andrei Elkin [ elkin ]
            Elkin Andrei Elkin made changes -
            Labels seconds-behind-master
            elenst Elena Stepanova made changes -
            Fix Version/s 10.1 [ 16100 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Sujatha Sivakumar [ sujatha.sivakumar ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10..4 [ 24902 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.5 [ 23123 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10..4 [ 24902 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Sujatha Sivakumar [ sujatha.sivakumar ] Andrei Elkin [ elkin ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 90206 ] MariaDB v4 [ 140987 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.2 [ 14601 ]
            bnestere Brandon Nesterenko made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            bnestere Brandon Nesterenko made changes -
            julien.fritsch Julien Fritsch made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            bnestere Brandon Nesterenko made changes -
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Andrei Elkin [ elkin ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            Assignee Andrei Elkin [ elkin ] Brandon Nesterenko [ JIRAUSER48702 ]
            bnestere Brandon Nesterenko made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            Elkin Andrei Elkin made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.3 [ 22126 ]
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.4 [ 22408 ]
            bnestere Brandon Nesterenko made changes -

            People

              bnestere Brandon Nesterenko
              stephane@skysql.com VAROQUI Stephane
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.