[MDEV-17516] Replication lag issue using parallel replication Created: 2018-10-22  Updated: 2023-09-27

Status: Stalled
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.36
Fix Version/s: 10.4, 10.5

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 1
Labels: seconds-behind-master

Issue Links:
Duplicate
is duplicated by MDEV-29639 Seconds_Behind_Master is incorrect fo... Closed
Relates
relates to MDEV-30458 Consolidate Serial Replica to Paralle... Open
relates to MDEV-31745 First Event After Starting a Delayed ... Open
relates to MDEV-7837 Seconds behind Master reports incorre... Stalled
relates to MDEV-32265 seconds_behind_master is inaccurate f... Closed

 Description   

Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after.

This happen by design
https://lists.launchpad.net/maria-developers/msg08958.html

but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master.

My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this

|   30 | system user  |                      | tsce_unedic | Connect | 2211 | altering table                                                                 | OPTIMIZE TABLE `requetes` |    0.000 |

And we can see wrong second behind master

        Seconds_Behind_Master: 0
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 0-21-28557589
                Parallel_Mode: conservative 

but on his master

gtid_current_pos       | 0-21-28570301 

A possible solution would be to update Seconds_Behind_Master by injecting a fake event in start slave with the max timestamp of all events read by the leader thread and send to to the worker threads .

To reproduce :

--source include/have_innodb.inc
--source include/have_binlog_format_mixed.inc
--let $rpl_topology=1->2
--source include/rpl_init.inc
 
# Test various aspects of parallel replication.
 
--connection server_1
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
 
CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
--save_master_pos
 
--connection server_2
--sync_with_master
--source include/stop_slave_sql_thread.inc
SET GLOBAL slave_parallel_threads=4;
 
--connection server_2
--sync_with_master
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=1;
 
--connection server_1
--disable_warnings
INSERT INTO t1 VALUES (1, SLEEP(100));
--wait 100s
INSERT INTO t1 VALUES (1, SLEEP(1));
 
--connection server_2
--source include/start_slave.inc
--let $status_items= Seconds_Behind_Master
--source include/show_slave_status.inc
--sync_with_master
--let $status_items= Seconds_Behind_Master
--source include/show_slave_status.inc



 Comments   
Comment by Brandon Nesterenko [ 2022-10-11 ]

Note that updating strategy of Seconds_Behind_Master also needs to consider the behavior of delayed slaves. See MDEV-29639 for details.

Comment by Brandon Nesterenko [ 2022-10-21 ]

Hi Andrei!

This is ready for review. PR-2296

Comment by Brandon Nesterenko [ 2023-01-31 ]

The cause of Seconds_Behind_Master (SBM) reporting as 0 after slave restart is that the relay log is empty until the IO thread is able to re-queue the events. That is, after STOP SLAVE has been issued, SBM is invalidated, and presented as NULL by SHOW SLAVE STATUS. Currently, this state is maintained until the slave threads are running again. However, SBM will report as 0 for the time it takes the IO thread to receive an event from the master, queue it to the relay log, and until the SQL thread reads it. This is misleading, as SBM is not actually 0, but should still be invalid because it is not yet known when the last timestamp from the master is.

The proposed fix (from Elkin) is to extend the duration of the NULL/invalid state of SBM until the first event is read by the SQL thread.

Comment by VAROQUI Stephane [ 2023-02-01 ]

NULL looks like a bad idea what about 666 ? None of the tools around monitoring will have to be changed with a valid numeric value .

Unless io_thread reports "No" when second behind_masster is reporting NULL

Best would be to refetch the first event from the relay log to extract last event time ever played from the rejoining node and compute the delay from current_time - first_relay_time. and later from connection set it to 0 if the first event receive from leader is same GTID as this one , i don't think this could happen as if exists multiple entry in relay log it's because of a late situation

For my knowledge what reason trigger the relay log to be deleted, if they get replayed the time can be read from sql_thread ?

Comment by Andrei Elkin [ 2023-02-01 ]

stephane@skysql.com, you have a good point and a nice recommendation. It's better not to endanger monitoring tools.
The relay log can be deleted at the slave start time due to the gtid slave setup via Change-Master...master_use_gtid=slave_pos (as well as by the legacy --relay-log-recovery).
It's not a big deal to memorize the last gained SBM to use it to compute an estimate while the first slave-restart event is still coming.
In case the last time SBM is unknown (the slave server restart), a maximum SBM would be displayed (rather than zero as of current).
bnestere, what you'd say as well?

Comment by VAROQUI Stephane [ 2023-02-01 ]

"memorize the last gained SBM", non valid for stop slave with no delay but the next event is a long query

To preserve the definition of SBM = time difference of last event in queue and oldest event in the queue being COMMITTED,
i'm still curious why not using a heartbeat from the leader enrich with timestamp of last binary log event would be more accurate, a slave would not start fetching event before first heartbeat ? And SBM definition become time difference last event in the leader and oldest event COMMITTED in the queue

Comment by VAROQUI Stephane [ 2023-02-01 ]

Andrei i have check the state Seconds_Behind_Master: NULL Slave_IO_Running: No already exist so will not break any tool , So init SBM NULL and transition of Slave_IO_Running to yes after first event fetch is correct

Comment by Andrei Elkin [ 2023-02-01 ]

stephane@skysql.com, to the HB exploitation , your direction is great. Just not HB, but when necessary it's feasible to add up to the master-slave connection handshake something like you propose.
E.g that the slave service is started for the 1st time on the (restarted) slave server. In the recommended CM...master_use_gtid = slave_pos in the handshake time slave would receive back the end-of-transaction timestamp corresponding to its last GTID executed (without this ts piece slave would be aware only of the GTID details of its last executed trx).
This measure refines 'a maximum SBM'.

Comment by VAROQUI Stephane [ 2023-02-01 ]

handshake to both of you so

Comment by VAROQUI Stephane [ 2023-02-01 ]

At the same time Hancheck + Heartbeat would refine SBM by accounting network time, long waiting request

Generated at Thu Feb 08 08:37:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.