Details
Description
Delayed Replicas, i.e. those using the MASTER_DELAY option of CHANGE MASTER TO, also configured to use parallel threads calculate Seconds_Behind_Master incorrectly. This commit changed parallel replicas to update Seconds_Behind_Master at the time of transaction commit. However, on a delayed replica, an event's Seconds_Behind_Master will not be calculated until after MASTER_DELAY seconds have passed and the event has finished executing. In other words, when a new event is received, the value of Seconds_Behind_Master will be calculated using the time of the last committed event, resulting in potentially very large values of Seconds_Behind_Master for the entire duration of MASTER_DELAY. This is especially prevalent for workloads with infrequent transactions.
The following MTR test highlights this issue:
--source include/master-slave.inc
|
--source include/have_binlog_format_row.inc
|
|
--echo #
|
--echo # Initialize test data
|
--connection master
|
create table t1 (a int);
|
insert into t1 values (1);
|
--source include/save_master_gtid.inc
|
|
--connection slave
|
--source include/sync_with_master_gtid.inc
|
--source include/stop_slave.inc
|
CHANGE MASTER TO MASTER_DELAY=4, MASTER_USE_GTID=Slave_Pos;
|
set @@global.slave_parallel_threads= 4;
|
--source include/start_slave.inc
|
|
--echo # Set up a long interval between now and the next event to boost SBM
|
--connection master
|
--sleep 10
|
|
--let $ctr=8
|
while($ctr)
|
{
|
--connection slave
|
|
# On the first iteration, SBM will be 0 because there are no new events
|
--let $status_items= Seconds_Behind_Master
|
--source include/show_slave_status.inc
|
|
--connection master
|
--eval insert into t1 values ($ctr)
|
--send select sleep(1)
|
--dec $ctr
|
|
# On the first iteration, SBM will boost to 10 because of the long
|
# interval, despite only just receiving the event
|
--connection slave
|
--source include/show_slave_status.inc
|
|
--connection master
|
--reap
|
}
|
|
|
|
--echo #
|
--echo # Cleanup
|
--connection master
|
DROP TABLE t1;
|
--source include/save_master_gtid.inc
|
|
--connection slave
|
--source include/sync_with_master_gtid.inc
|
--source include/stop_slave.inc
|
CHANGE MASTER TO MASTER_DELAY=0;
|
set @@global.slave_parallel_threads= 0;
|
--source include/start_slave.inc
|
|
--source include/rpl_end.inc
|
|
--echo # End of tests
|
Attachments
Issue Links
- causes
-
MDEV-30619 Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers
- Closed
- duplicates
-
MDEV-17516 Replication lag issue using parallel replication
- Stalled
- relates to
-
MDEV-30458 Consolidate Serial Replica to Parallel Replica with 1 Worker Thread
- Open
-
MDEV-30608 rpl.rpl_delayed_parallel_slave_sbm sometimes fails with Seconds_Behind_Master should not have used second transaction timestamp
- Closed
-
MDEV-32265 seconds_behind_master is inaccurate for Delayed replication
- Closed
-
MDEV-34778 Inconsistent Seconds_Behind_Master after slave stop+start (rpl.rpl_old_master sporadic failure)
- Open
-
MDEV-31745 First Event After Starting a Delayed Parallel Replica Shows 0 Seconds_Behind_Master
- Open