[MDEV-17516] Replication lag issue using parallel replication - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.1.36
Fix Version/s: 10.5
Component/s: Replication
Labels:
- seconds-behind-master

Description

Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after.

This happen by design
https://lists.launchpad.net/maria-developers/msg08958.html

but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master.

My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this

|   30 | system user  |                      | tsce_unedic | Connect | 2211 | altering table                                                                 | OPTIMIZE TABLE `requetes` |    0.000 |

And we can see wrong second behind master

        Seconds_Behind_Master: 0

                   Using_Gtid: Slave_Pos

                  Gtid_IO_Pos: 0-21-28557589

                Parallel_Mode: conservative

but on his master

gtid_current_pos       | 0-21-28570301

A possible solution would be to update Seconds_Behind_Master by injecting a fake event in start slave with the max timestamp of all events read by the leader thread and send to to the worker threads .

To reproduce :

--source include/have_innodb.inc

--source include/have_binlog_format_mixed.inc

--let $rpl_topology=1->2

--source include/rpl_init.inc

# Test various aspects of parallel replication.

--connection server_1

ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;

CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB;

--save_master_pos

--connection server_2

--sync_with_master

--source include/stop_slave_sql_thread.inc

SET GLOBAL slave_parallel_threads=4;

--connection server_2

--sync_with_master

--source include/stop_slave.inc

SET GLOBAL slave_parallel_threads=1;

--connection server_1

--disable_warnings

INSERT INTO t1 VALUES (1, SLEEP(100));

--wait 100s

INSERT INTO t1 VALUES (1, SLEEP(1));

--connection server_2

--source include/start_slave.inc

--let $status_items= Seconds_Behind_Master

--source include/show_slave_status.inc

--sync_with_master

--let $status_items= Seconds_Behind_Master

--source include/show_slave_status.inc

Attachments

Issue Links

is duplicated by

MDEV-29639 Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas

Closed

relates to

MDEV-30458 Consolidate Serial Replica to Parallel Replica with 1 Worker Thread

Open

MDEV-30619 Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers

Closed

MDEV-31745 First Event After Starting a Delayed Parallel Replica Shows 0 Seconds_Behind_Master

Open

MDEV-7837 Seconds behind Master reports incorrect value when Parallel replication is used

Closed

MDEV-32265 seconds_behind_master is inaccurate for Delayed replication

Closed

(1 relates to)

Activity

Ascending order - Click to sort in descending order

VAROQUI Stephane created issue - 2018-10-22 09:02

VAROQUI Stephane made changes - 2018-10-22 09:09

Field	Original Value	New Value
Description	Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after. This happen by design https://lists.launchpad.net/maria-developers/msg08958.html but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master. My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this {noformat} \| 30 \| system user \| \| tsce_unedic \| Connect \| 2211 \| altering table \| OPTIMIZE TABLE `requetes` \| 0.000 \| {noformat} And we can see wrong second behind master {noformat} Seconds_Behind_Master: 0 Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-21-28557589 Parallel_Mode: conservative {noformat} but on his master {noformat} gtid_current_pos \| 0-21-28570301 {noformat} A possible solution would be to update Seconds_Behind_Master by injecting a fake event ion start slave with the timestamp of the first event read by the leader thread To reproduce : {noformat} --source include/have_innodb.inc --source include/have_binlog_format_mixed.inc --let $rpl_topology=1->2 --source include/rpl_init.inc # Test various aspects of parallel replication. --connection server_1 ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; --save_master_pos --connection server_2 --sync_with_master --source include/stop_slave_sql_thread.inc SET GLOBAL slave_parallel_threads=4; --connection server_2 --sync_with_master --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=1; --connection server_1 --disable_warnings INSERT INTO t1 VALUES (1, SLEEP(3600)); --connection server_2 --source include/start_slave.inc --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc --sync_with_master --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc {noformat}	Using parallel replication second behind master is wrongly reporting 0 when SQL thread is stopped and restarted long time after. This happen by design https://lists.launchpad.net/maria-developers/msg08958.html but is really a show stopper for most proxy that send traffic to such slave thinking it's in sync with master. My understanding is that slave_behind_master is computed after first commit so in this case the master is 2 days in advance and on a fresh restarted slave we get this {noformat} \| 30 \| system user \| \| tsce_unedic \| Connect \| 2211 \| altering table \| OPTIMIZE TABLE `requetes` \| 0.000 \| {noformat} And we can see wrong second behind master {noformat} Seconds_Behind_Master: 0 Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-21-28557589 Parallel_Mode: conservative {noformat} but on his master {noformat} gtid_current_pos \| 0-21-28570301 {noformat} A possible solution would be to update Seconds_Behind_Master by injecting a fake event in start slave with the max timestamp of all events read by the leader thread and send to to the worker threads . To reproduce : {noformat} --source include/have_innodb.inc --source include/have_binlog_format_mixed.inc --let $rpl_topology=1->2 --source include/rpl_init.inc # Test various aspects of parallel replication. --connection server_1 ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB; CREATE TABLE t1 (a INT PRIMARY KEY, b INT) ENGINE=InnoDB; --save_master_pos --connection server_2 --sync_with_master --source include/stop_slave_sql_thread.inc SET GLOBAL slave_parallel_threads=4; --connection server_2 --sync_with_master --source include/stop_slave.inc SET GLOBAL slave_parallel_threads=1; --connection server_1 --disable_warnings INSERT INTO t1 VALUES (1, SLEEP(100)); --wait 100s INSERT INTO t1 VALUES (1, SLEEP(1)); --connection server_2 --source include/start_slave.inc --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc --sync_with_master --let $status_items= Seconds_Behind_Master --source include/show_slave_status.inc {noformat}

VAROQUI Stephane made changes - 2018-10-22 09:22

Link

This issue relates to ~~MDEV-7837~~ [ ~~MDEV-7837~~ ]

Elena Stepanova made changes - 2018-10-22 20:35

Assignee

Andrei Elkin [ elkin ]

Andrei Elkin made changes - 2018-11-13 18:41

Labels

seconds-behind-master

Elena Stepanova made changes - 2018-12-11 21:17

Fix Version/s

10.1 [ 16100 ]

Andrei Elkin made changes - 2019-06-03 10:21

Assignee

Andrei Elkin [ elkin ]

Sujatha Sivakumar [ sujatha.sivakumar ]

Julien Fritsch made changes - 2020-08-27 07:45

Fix Version/s		10..4 [ 24902 ]
Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.5 [ 23123 ]

Sergei Golubchik made changes - 2020-09-07 15:02

Fix Version/s		10.4 [ 22408 ]
Fix Version/s	10..4 [ 24902 ]

Julien Fritsch made changes - 2020-11-06 16:07

Fix Version/s

10.1 [ 16100 ]

Julien Fritsch made changes - 2021-03-19 16:18

Assignee

Sujatha Sivakumar [ sujatha.sivakumar ]

Andrei Elkin [ elkin ]

Sergei Golubchik made changes - 2021-12-06 21:33

Workflow

MariaDB v3 [ 90206 ]

MariaDB v4 [ 140987 ]

Andrei Elkin made changes - 2022-02-08 14:10

Assignee

Andrei Elkin [ elkin ]

Brandon Nesterenko [ JIRAUSER48702 ]

Ralf Gebhardt made changes - 2022-08-04 08:44

Fix Version/s

10.2 [ 14601 ]

Brandon Nesterenko made changes - 2022-10-11 18:33

Status

Open [ 1 ]

Confirmed [ 10101 ]

Brandon Nesterenko made changes - 2022-10-11 18:37

Link

This issue is duplicated by ~~MDEV-29639~~ [ ~~MDEV-29639~~ ]

Brandon Nesterenko added a comment - 2022-10-11 18:39

Note that updating strategy of Seconds_Behind_Master also needs to consider the behavior of delayed slaves. See ~~MDEV-29639~~ for details.

Brandon Nesterenko added a comment - 2022-10-11 18:39 Note that updating strategy of Seconds_Behind_Master also needs to consider the behavior of delayed slaves. See MDEV-29639 for details.

Julien Fritsch made changes - 2022-10-18 14:38

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Brandon Nesterenko added a comment - 2022-10-21 02:17

Hi Andrei!

This is ready for review. PR-2296

Brandon Nesterenko added a comment - 2022-10-21 02:17 Hi Andrei! This is ready for review. PR-2296

Brandon Nesterenko made changes - 2022-10-21 02:17

Assignee	Brandon Nesterenko [ JIRAUSER48702 ]	Andrei Elkin [ elkin ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Brandon Nesterenko made changes - 2023-01-24 16:49

Link

This issue relates to MDEV-30458 [ MDEV-30458 ]

Brandon Nesterenko made changes - 2023-01-30 16:42

Assignee

Andrei Elkin [ elkin ]

Brandon Nesterenko [ JIRAUSER48702 ]

Brandon Nesterenko made changes - 2023-01-30 16:43

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Brandon Nesterenko added a comment - 2023-01-31 21:15

The cause of Seconds_Behind_Master (SBM) reporting as 0 after slave restart is that the relay log is empty until the IO thread is able to re-queue the events. That is, after STOP SLAVE has been issued, SBM is invalidated, and presented as NULL by SHOW SLAVE STATUS. Currently, this state is maintained until the slave threads are running again. However, SBM will report as 0 for the time it takes the IO thread to receive an event from the master, queue it to the relay log, and until the SQL thread reads it. This is misleading, as SBM is not actually 0, but should still be invalid because it is not yet known when the last timestamp from the master is.

The proposed fix (from Elkin) is to extend the duration of the NULL/invalid state of SBM until the first event is read by the SQL thread.

Brandon Nesterenko added a comment - 2023-01-31 21:15 The cause of Seconds_Behind_Master (SBM) reporting as 0 after slave restart is that the relay log is empty until the IO thread is able to re-queue the events. That is, after STOP SLAVE has been issued, SBM is invalidated, and presented as NULL by SHOW SLAVE STATUS. Currently, this state is maintained until the slave threads are running again. However, SBM will report as 0 for the time it takes the IO thread to receive an event from the master, queue it to the relay log, and until the SQL thread reads it. This is misleading, as SBM is not actually 0, but should still be invalid because it is not yet known when the last timestamp from the master is. The proposed fix (from Elkin ) is to extend the duration of the NULL/invalid state of SBM until the first event is read by the SQL thread.

VAROQUI Stephane added a comment - 2023-02-01 06:50 - edited

NULL looks like a bad idea what about 666 ? None of the tools around monitoring will have to be changed with a valid numeric value .

Unless io_thread reports "No" when second behind_masster is reporting NULL

Best would be to refetch the first event from the relay log to extract last event time ever played from the rejoining node and compute the delay from current_time - first_relay_time. and later from connection set it to 0 if the first event receive from leader is same GTID as this one , i don't think this could happen as if exists multiple entry in relay log it's because of a late situation

For my knowledge what reason trigger the relay log to be deleted, if they get replayed the time can be read from sql_thread ?

VAROQUI Stephane added a comment - 2023-02-01 06:50 - edited NULL looks like a bad idea what about 666 ? None of the tools around monitoring will have to be changed with a valid numeric value . Unless io_thread reports "No" when second behind_masster is reporting NULL Best would be to refetch the first event from the relay log to extract last event time ever played from the rejoining node and compute the delay from current_time - first_relay_time. and later from connection set it to 0 if the first event receive from leader is same GTID as this one , i don't think this could happen as if exists multiple entry in relay log it's because of a late situation For my knowledge what reason trigger the relay log to be deleted, if they get replayed the time can be read from sql_thread ?

Andrei Elkin added a comment - 2023-02-01 11:57 - edited

stephane@skysql.com, you have a good point and a nice recommendation. It's better not to endanger monitoring tools.
The relay log can be deleted at the slave start time due to the gtid slave setup via Change-Master...master_use_gtid=slave_pos (as well as by the legacy --relay-log-recovery).
It's not a big deal to memorize the last gained SBM to use it to compute an estimate while the first slave-restart event is still coming.
In case the last time SBM is unknown (the slave server restart), a maximum SBM would be displayed (rather than zero as of current).
bnestere, what you'd say as well?

Andrei Elkin added a comment - 2023-02-01 11:57 - edited stephane@skysql.com , you have a good point and a nice recommendation. It's better not to endanger monitoring tools. The relay log can be deleted at the slave start time due to the gtid slave setup via Change-Master...master_use_gtid=slave_pos (as well as by the legacy --relay-log-recovery ). It's not a big deal to memorize the last gained SBM to use it to compute an estimate while the first slave-restart event is still coming. In case the last time SBM is unknown (the slave server restart), a maximum SBM would be displayed (rather than zero as of current). bnestere , what you'd say as well?

VAROQUI Stephane added a comment - 2023-02-01 13:38

"memorize the last gained SBM", non valid for stop slave with no delay but the next event is a long query

To preserve the definition of SBM = time difference of last event in queue and oldest event in the queue being COMMITTED,
i'm still curious why not using a heartbeat from the leader enrich with timestamp of last binary log event would be more accurate, a slave would not start fetching event before first heartbeat ? And SBM definition become time difference last event in the leader and oldest event COMMITTED in the queue

VAROQUI Stephane added a comment - 2023-02-01 13:38 "memorize the last gained SBM", non valid for stop slave with no delay but the next event is a long query To preserve the definition of SBM = time difference of last event in queue and oldest event in the queue being COMMITTED, i'm still curious why not using a heartbeat from the leader enrich with timestamp of last binary log event would be more accurate, a slave would not start fetching event before first heartbeat ? And SBM definition become time difference last event in the leader and oldest event COMMITTED in the queue

VAROQUI Stephane added a comment - 2023-02-01 13:49

Andrei i have check the state Seconds_Behind_Master: NULL Slave_IO_Running: No already exist so will not break any tool , So init SBM NULL and transition of Slave_IO_Running to yes after first event fetch is correct

VAROQUI Stephane added a comment - 2023-02-01 13:49 Andrei i have check the state Seconds_Behind_Master: NULL Slave_IO_Running: No already exist so will not break any tool , So init SBM NULL and transition of Slave_IO_Running to yes after first event fetch is correct

Andrei Elkin added a comment - 2023-02-01 17:55 - edited

stephane@skysql.com, to the HB exploitation , your direction is great. Just not HB, but when necessary it's feasible to add up to the master-slave connection handshake something like you propose.
E.g that the slave service is started for the 1st time on the (restarted) slave server. In the recommended CM...master_use_gtid = slave_pos in the handshake time slave would receive back the end-of-transaction timestamp corresponding to its last GTID executed (without this ts piece slave would be aware only of the GTID details of its last executed trx).
This measure refines 'a maximum SBM'.

Andrei Elkin added a comment - 2023-02-01 17:55 - edited stephane@skysql.com , to the HB exploitation , your direction is great. Just not HB, but when necessary it's feasible to add up to the master-slave connection handshake something like you propose. E.g that the slave service is started for the 1st time on the (restarted) slave server. In the recommended CM...master_use_gtid = slave_pos in the handshake time slave would receive back the end-of-transaction timestamp corresponding to its last GTID executed (without this ts piece slave would be aware only of the GTID details of its last executed trx). This measure refines 'a maximum SBM'.

VAROQUI Stephane added a comment - 2023-02-01 18:06

handshake to both of you so

VAROQUI Stephane added a comment - 2023-02-01 18:06 handshake to both of you so

VAROQUI Stephane added a comment - 2023-02-01 18:14

At the same time Hancheck + Heartbeat would refine SBM by accounting network time, long waiting request

VAROQUI Stephane added a comment - 2023-02-01 18:14 At the same time Hancheck + Heartbeat would refine SBM by accounting network time, long waiting request

Andrei Elkin made changes - 2023-02-16 17:17

Priority

Critical [ 2 ]

Major [ 3 ]

Julien Fritsch made changes - 2023-04-27 14:25

Fix Version/s

10.3 [ 22126 ]

Brandon Nesterenko made changes - 2023-07-19 20:25

Link

This issue relates to MDEV-31745 [ MDEV-31745 ]

Brandon Nesterenko made changes - 2023-09-27 22:26

Link

This issue relates to ~~MDEV-32265~~ [ ~~MDEV-32265~~ ]

Julien Fritsch made changes - 2024-09-10 15:05

Fix Version/s

10.4 [ 22408 ]

Brandon Nesterenko made changes - 2025-01-20 13:42

Link

This issue relates to ~~MDEV-30619~~ [ ~~MDEV-30619~~ ]

People

Assignee:: Brandon Nesterenko

Reporter:: VAROQUI Stephane

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2018-10-22 09:02

Updated:: 2025-01-20 13:42

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration