Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13895

GTID and Master_Delay causes excessive initial delay

Details

    Description

      When attempting to delay a Slave attached with GTID, there appears to be an extra delay applied initially. For example, this output reflects a Slave that is already delayed by 43200 seconds. When switching to GTID replication, replication is paused until SQL_Remaining_Delay counts down to 0:

      MariaDB [(none)]> stop slave; CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO MASTER_DELAY=43200; start slave;

      Relay_Master_Log_File: bin-log.018039
      Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      Exec_Master_Log_Pos: 625
      Seconds_Behind_Master: 44847
      Using_Gtid: Current_Pos
      SQL_Delay: 43200
      SQL_Remaining_Delay: 43089
      Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

      When switching to conventional replication coordinates, the delay is applied as expected:

      MariaDB [(none)]> change master to master_log_file='bin-log.018040', master_log_pos=20289860;
      Query OK, 0 rows affected (0.25 sec)

      Relay_Master_Log_File: bin-log.018040
      Slave_IO_Running: Yes
      Slave_SQL_Running: Yes
      Exec_Master_Log_Pos: 28182467
      Seconds_Behind_Master: 43636
      Using_Gtid: No
      SQL_Delay: 43200
      SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Unlocking tables

      Attachments

        Issue Links

          Activity

            Thanks for the report.

            The test below is for reproducing only, do not put it into the regression suite!

            The test sets delay to 20 seconds, then runs 2 inserts on master with 10-second interval, and checks status on the slave over the next minute.
            Comments show which result is expect at which step.
            With master_use_gtid=no everything works as expected. With master_use_gtid=current_pos, indeed, there is an extra delay:

            ####################################
            # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row)
            Slave_IO_Running = 'Yes'
            Slave_SQL_Running = 'Yes'
            Seconds_Behind_Master = '23'
            SQL_Delay = '20'
            SQL_Remaining_Delay = '17'
            select * from t1;
            i
            ####################################
            # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows)
            Slave_IO_Running = 'Yes'
            Slave_SQL_Running = 'Yes'
            Seconds_Behind_Master = '33'
            SQL_Delay = '20'
            SQL_Remaining_Delay = '7'
            select * from t1;
            i
            

            At the end the slave catches up.

            mdev13895.test

            # The test uses env variable USE_GTID. Set it to 'no' or 'current_pos'
             
            --source include/master-slave.inc
            --source include/have_binlog_format_mixed.inc
             
            create table t1 (i int);
             
            --sync_slave_with_master
            stop slave;
            eval change master to master_use_gtid= $USE_GTID;
            change master to master_delay= 20;
            start slave;
             
            --connection master
            insert into t1 values (1);
            --sleep 10
            insert into t1 values (2);
             
            --let $status_items= Slave_IO_Running,Slave_SQL_Running,Seconds_Behind_Master,SQL_Delay,SQL_Remaining_Delay
             
            --connection slave
            --echo ####################################
            --echo # First check, 10 sec since 1st insert (expected behind ~10, remaining ~10, no rows)
            --source include/show_slave_status.inc
            select * from t1;
            --sleep 5
            --echo ####################################
            --echo # After 5 seconds waiting (expected behind ~15, remaining ~5, no rows)
            --source include/show_slave_status.inc
            select * from t1;
            --sleep 8
            --echo ####################################
            --echo # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row)
            --source include/show_slave_status.inc
            select * from t1;
            --sleep 10
            --echo ####################################
            --echo # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows)
            --source include/show_slave_status.inc
            select * from t1;
            --sleep 10
            --echo ####################################
            --echo # After 33 seconds waiting (expected behind 0, remaining NULL, 2 rows)
            --source include/show_slave_status.inc
            select * from t1;
             
             
            # Cleanup
             
            stop slave;
            change master to master_use_gtid=no, master_delay=0;
            start slave;
            --connection master
            drop table t1;
            --sync_slave_with_master
            --connection master
            --source include/rpl_end.inc
            

            elenst Elena Stepanova added a comment - Thanks for the report. The test below is for reproducing only, do not put it into the regression suite! The test sets delay to 20 seconds, then runs 2 inserts on master with 10-second interval, and checks status on the slave over the next minute. Comments show which result is expect at which step. With master_use_gtid=no everything works as expected. With master_use_gtid=current_pos , indeed, there is an extra delay: #################################### # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row) Slave_IO_Running = 'Yes' Slave_SQL_Running = 'Yes' Seconds_Behind_Master = '23' SQL_Delay = '20' SQL_Remaining_Delay = '17' select * from t1; i #################################### # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows) Slave_IO_Running = 'Yes' Slave_SQL_Running = 'Yes' Seconds_Behind_Master = '33' SQL_Delay = '20' SQL_Remaining_Delay = '7' select * from t1; i At the end the slave catches up. mdev13895.test # The test uses env variable USE_GTID. Set it to 'no' or 'current_pos'   --source include/master-slave.inc --source include/have_binlog_format_mixed.inc   create table t1 (i int );   --sync_slave_with_master stop slave; eval change master to master_use_gtid= $USE_GTID; change master to master_delay= 20; start slave;   --connection master insert into t1 values (1); --sleep 10 insert into t1 values (2);   --let $status_items= Slave_IO_Running,Slave_SQL_Running,Seconds_Behind_Master,SQL_Delay,SQL_Remaining_Delay   --connection slave --echo #################################### --echo # First check, 10 sec since 1st insert (expected behind ~10, remaining ~10, no rows) --source include/show_slave_status.inc select * from t1; --sleep 5 --echo #################################### --echo # After 5 seconds waiting (expected behind ~15, remaining ~5, no rows) --source include/show_slave_status.inc select * from t1; --sleep 8 --echo #################################### --echo # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row) --source include/show_slave_status.inc select * from t1; --sleep 10 --echo #################################### --echo # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows) --source include/show_slave_status.inc select * from t1; --sleep 10 --echo #################################### --echo # After 33 seconds waiting (expected behind 0, remaining NULL, 2 rows) --source include/show_slave_status.inc select * from t1;     # Cleanup   stop slave; change master to master_use_gtid= no , master_delay=0; start slave; --connection master drop table t1; --sync_slave_with_master --connection master --source include/rpl_end.inc
            Elkin Andrei Elkin added a comment -

            Most probably the case hosts MDEV-17469's greater-than-expected SBM which also happens there having GTID replication mode.
            I suggest we follow up from here esp having a valuable how-to-repeat from Elena.

            Elkin Andrei Elkin added a comment - Most probably the case hosts MDEV-17469 's greater-than-expected SBM which also happens there having GTID replication mode. I suggest we follow up from here esp having a valuable how-to-repeat from Elena.

            Hello Andrei,

            Please refer the following branch for the patch.

            bb-10.2-MDEV-13895

            Kindly review the changes.

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Hello Andrei, Please refer the following branch for the patch. bb-10.2- MDEV-13895 Kindly review the changes.
            Elkin Andrei Elkin added a comment -

            The patch looks good, thank for this piece of work and analysis!

            Elkin Andrei Elkin added a comment - The patch looks good, thank for this piece of work and analysis!

            Fix for this issue is implemented as part of following commit in 10.2.24.

            e42192d7b3821640bcf18c58dc303a2338c6d1aa

            sujatha.sivakumar Sujatha Sivakumar (Inactive) added a comment - Fix for this issue is implemented as part of following commit in 10.2.24. e42192d7b3821640bcf18c58dc303a2338c6d1aa

            People

              sujatha.sivakumar Sujatha Sivakumar (Inactive)
              zam1081 Marcus Zambrano
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.