[MDEV-13895] GTID and Master_Delay causes excessive initial delay - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.2.8, 10.2(EOL)
Fix Version/s: 10.2.24, 10.3.14, 10.4.4
Component/s: Replication
Labels:
- Seconds_Behind_Master
- seconds-behind-master
Environment:
RHEL 6 x86_64

Description

When attempting to delay a Slave attached with GTID, there appears to be an extra delay applied initially. For example, this output reflects a Slave that is already delayed by 43200 seconds. When switching to GTID replication, replication is paused until SQL_Remaining_Delay counts down to 0:

MariaDB [(none)]> stop slave; CHANGE MASTER TO master_use_gtid=current_pos; CHANGE MASTER TO MASTER_DELAY=43200; start slave;

Relay_Master_Log_File: bin-log.018039
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 625
Seconds_Behind_Master: 44847
Using_Gtid: Current_Pos
SQL_Delay: 43200
SQL_Remaining_Delay: 43089
Slave_SQL_Running_State: Waiting until MASTER_DELAY seconds after master executed event

When switching to conventional replication coordinates, the delay is applied as expected:

MariaDB [(none)]> change master to master_log_file='bin-log.018040', master_log_pos=20289860;
Query OK, 0 rows affected (0.25 sec)

Relay_Master_Log_File: bin-log.018040
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 28182467
Seconds_Behind_Master: 43636
Using_Gtid: No
SQL_Delay: 43200
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: Unlocking tables

Attachments

Issue Links

relates to

MDEV-17469 Replication Slave + SQL_Delay

Closed

Activity

Ascending order - Click to sort in descending order

Elena Stepanova added a comment - 2017-09-30 17:20

Thanks for the report.

The test below is for reproducing only, do not put it into the regression suite!

The test sets delay to 20 seconds, then runs 2 inserts on master with 10-second interval, and checks status on the slave over the next minute.
Comments show which result is expect at which step.
With master_use_gtid=no everything works as expected. With master_use_gtid=current_pos, indeed, there is an extra delay:

####################################

# After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row)

Slave_IO_Running = 'Yes'

Slave_SQL_Running = 'Yes'

Seconds_Behind_Master = '23'

SQL_Delay = '20'

SQL_Remaining_Delay = '17'

select * from t1;

####################################

# After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows)

Slave_IO_Running = 'Yes'

Slave_SQL_Running = 'Yes'

Seconds_Behind_Master = '33'

SQL_Delay = '20'

SQL_Remaining_Delay = '7'

select * from t1;

At the end the slave catches up.

mdev13895.test
# The test uses env variable USE_GTID. Set it to 'no' or 'current_pos'

--source include/master-slave.inc
--source include/have_binlog_format_mixed.inc

create table t1 (i int);

--sync_slave_with_master
stop slave;
eval change master to master_use_gtid= $USE_GTID;
change master to master_delay= 20;
start slave;

--connection master
insert into t1 values (1);
--sleep 10
insert into t1 values (2);

--let $status_items= Slave_IO_Running,Slave_SQL_Running,Seconds_Behind_Master,SQL_Delay,SQL_Remaining_Delay

--connection slave
--echo ####################################
--echo # First check, 10 sec since 1st insert (expected behind ~10, remaining ~10, no rows)
--source include/show_slave_status.inc
select * from t1;
--sleep 5
--echo ####################################
--echo # After 5 seconds waiting (expected behind ~15, remaining ~5, no rows)
--source include/show_slave_status.inc
select * from t1;
--sleep 8
--echo ####################################
--echo # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row)
--source include/show_slave_status.inc
select * from t1;
--sleep 10
--echo ####################################
--echo # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows)
--source include/show_slave_status.inc
select * from t1;
--sleep 10
--echo ####################################
--echo # After 33 seconds waiting (expected behind 0, remaining NULL, 2 rows)
--source include/show_slave_status.inc
select * from t1;


# Cleanup

stop slave;
change master to master_use_gtid=no, master_delay=0;
start slave;
--connection master
drop table t1;
--sync_slave_with_master
--connection master
--source include/rpl_end.inc

Elena Stepanova added a comment - 2017-09-30 17:20 Thanks for the report. The test below is for reproducing only, do not put it into the regression suite! The test sets delay to 20 seconds, then runs 2 inserts on master with 10-second interval, and checks status on the slave over the next minute. Comments show which result is expect at which step. With master_use_gtid=no everything works as expected. With master_use_gtid=current_pos , indeed, there is an extra delay: #################################### # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row) Slave_IO_Running = 'Yes' Slave_SQL_Running = 'Yes' Seconds_Behind_Master = '23' SQL_Delay = '20' SQL_Remaining_Delay = '17' select * from t1; i #################################### # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows) Slave_IO_Running = 'Yes' Slave_SQL_Running = 'Yes' Seconds_Behind_Master = '33' SQL_Delay = '20' SQL_Remaining_Delay = '7' select * from t1; i At the end the slave catches up. mdev13895.test # The test uses env variable USE_GTID. Set it to 'no' or 'current_pos' --source include/master-slave.inc --source include/have_binlog_format_mixed.inc create table t1 (i int ); --sync_slave_with_master stop slave; eval change master to master_use_gtid= $USE_GTID; change master to master_delay= 20; start slave; --connection master insert into t1 values (1); --sleep 10 insert into t1 values (2); --let $status_items= Slave_IO_Running,Slave_SQL_Running,Seconds_Behind_Master,SQL_Delay,SQL_Remaining_Delay --connection slave --echo #################################### --echo # First check, 10 sec since 1st insert (expected behind ~10, remaining ~10, no rows) --source include/show_slave_status.inc select * from t1; --sleep 5 --echo #################################### --echo # After 5 seconds waiting (expected behind ~15, remaining ~5, no rows) --source include/show_slave_status.inc select * from t1; --sleep 8 --echo #################################### --echo # After 13 seconds waiting (expected behind ~13, remaining ~7, 1 row) --source include/show_slave_status.inc select * from t1; --sleep 10 --echo #################################### --echo # After 23 seconds waiting (expected behind 0, remaining NULL, 2 rows) --source include/show_slave_status.inc select * from t1; --sleep 10 --echo #################################### --echo # After 33 seconds waiting (expected behind 0, remaining NULL, 2 rows) --source include/show_slave_status.inc select * from t1; # Cleanup stop slave; change master to master_use_gtid= no , master_delay=0; start slave; --connection master drop table t1; --sync_slave_with_master --connection master --source include/rpl_end.inc

Andrei Elkin added a comment - 2018-10-29 14:34

Most probably the case hosts ~~MDEV-17469~~'s greater-than-expected SBM which also happens there having GTID replication mode.
I suggest we follow up from here esp having a valuable how-to-repeat from Elena.

Andrei Elkin added a comment - 2018-10-29 14:34 Most probably the case hosts MDEV-17469 's greater-than-expected SBM which also happens there having GTID replication mode. I suggest we follow up from here esp having a valuable how-to-repeat from Elena.

Sujatha Sivakumar (Inactive) added a comment - 2019-03-11 12:32

Hello Andrei,

Please refer the following branch for the patch.

bb-10.2-~~MDEV-13895~~

Kindly review the changes.

Sujatha Sivakumar (Inactive) added a comment - 2019-03-11 12:32 Hello Andrei, Please refer the following branch for the patch. bb-10.2- MDEV-13895 Kindly review the changes.

Andrei Elkin added a comment - 2019-03-25 16:48

The patch looks good, thank for this piece of work and analysis!

Andrei Elkin added a comment - 2019-03-25 16:48 The patch looks good, thank for this piece of work and analysis!

Sujatha Sivakumar (Inactive) added a comment - 2019-03-28 15:16

Fix for this issue is implemented as part of following commit in 10.2.24.

e42192d7b3821640bcf18c58dc303a2338c6d1aa

Sujatha Sivakumar (Inactive) added a comment - 2019-03-28 15:16 Fix for this issue is implemented as part of following commit in 10.2.24. e42192d7b3821640bcf18c58dc303a2338c6d1aa

MariaDB Server

GTID and Master_Delay causes excessive initial delay

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration