[MDEV-33856] Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 11.6.0
Component/s: Information Schema, Replication
Labels:
- Preview_11.6

Description

The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/

The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

The suggested changes are:

Parallel execution uses the time AFTER execution while 'normal' replication used the value BEFORE to calculate Seconds_Behind_Master. This should be changed to always use AFTER.
Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: Newest event completion time from master (when writing to the relay log) - Completion time for last committed event from the master. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

As an example with SQL_DELAY=5 hours:

IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

In this case the Seconds_Behind_master will be 5 hours.

The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:

Without and without delayed replication
When the master has been down and is just coming up.
If the master has been 'idle' for some time.
When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting Newest event completion time from master as 'master current time' when connecting.

Note too that the final commit also adds in the table information_schema.slave_status (~~MDEV-33526~~), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.

Attachments

Issue Links

causes

MDEV-34354 Shows replication time difference in master_last_event_time after setting MASTER_DELAY > 0 in chain replication

Closed

MDEV-34420 information_schema.slave_status Bad String Cut-off

Closed

MDEV-34752 Create alias replica_status for INFORMATION_SCHEMA.slave_status

Open

MDEV-34765 rpl.master_last_event_time_stmt fails with Result Length Mismatch

Closed

MDEV-35939 rpl.rpl_parallel_sbm: "Slave_last_event_time is not equal to Master_last_event_time"

Closed

MDEV-36001 DBUG_ASSERT(master_timestamp >= slave_timestamp) fails in rpl_gtid_crash

Open

includes

MDEV-32172 introspect server's replication settings from SQL stored routines

Closed

MDEV-33526 Create IS.slave_status table as alias for show replica status command

Closed

relates to

MDEV-11123 `Seconds_Behind_Master` is not accessible through `information_schema`

Closed

MDEV-13590 Convert SHOW master/slave statments to information_schema plugin

Open

MDEV-34843 Seconds_Behind_Master is not zero after slave syncs up with master

Open

MDEV-35858 rpl.rpl_parallel_sbm: Slave_last_event_time is not equal to Master_last_event_time

Open

(1 causes, 2 includes, 4 relates to)

Activity

Ascending order - Click to sort in descending order

Michael Widenius created issue - 2024-04-08 15:54

Michael Widenius made changes - 2024-04-08 15:55

Field	Original Value	New Value
Key	~~MENT-2063~~	~~MDEV-33856~~
Project	MariaDB Enterprise [ 11500 ]	MariaDB Server [ 10000 ]

Michael Widenius made changes - 2024-04-08 15:56

Fix Version/s

11.5 [ 29506 ]

Andrei Elkin made changes - 2024-04-08 16:12

Assignee

Andrei Elkin [ elkin ]

Michael Widenius added a comment - 2024-04-09 08:25

Another solutions for how we could adjust times when calculating Seconds_Behind_master:

Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
Last commit on slave was 2023-01-01 00:00:00
Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.

The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.

Michael Widenius added a comment - 2024-04-09 08:25 Another solutions for how we could adjust times when calculating Seconds_Behind_master: Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min) Last commit on slave was 2023-01-01 00:00:00 Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave. currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state. However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00 The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00. From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query. The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time. The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.

Michael Widenius made changes - 2024-04-09 08:27

Description

The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/

The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

The suggested changes are:
* Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
* Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

As an example with SQL_DELAY=5 hours:
- IO thread reads an event that was completed at 2023:01:01 06:01:01
- The SQL thread commits an event that was originally committed at 2023:01:01 01:01:01.

In this case the Seconds_Behind_master is 5 hours.

The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
* Without and without delayed replication
* When the master has been down and is just coming up.
* If the master has been 'idle' for some time.
* When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.

The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/

The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

The suggested changes are:
* Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
* Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

As an example with SQL_DELAY=5 hours:
- IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
- The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

In this case the Seconds_Behind_master will be 5 hours.

The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
* Without and without delayed replication
* When the master has been down and is just coming up.
* If the master has been 'idle' for some time.
* When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.

Alexander Keremidarski added a comment - 2024-04-10 11:02 - edited

Please consider also higher priority for MDEV-32446
In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master.
A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind.
GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well.
Consider an example with:
sesconds_behind_master = 2753
GTIDs_behind_master = 1

Alexander Keremidarski added a comment - 2024-04-10 11:02 - edited Please consider also higher priority for MDEV-32446 In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master. A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind. GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well. Consider an example with: sesconds_behind_master = 2753 GTIDs_behind_master = 1

Andrei Elkin added a comment - 2024-04-10 14:18 - edited

salle, the MDEV-32446 concern is actually addressed, and for a single gtid domain it's

GTIDs_behind_master = ShowSlaveStatus::Gtid_IO_Pos - @@global.gtid_slave_pos

The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_*) reliably for normal cases of gtid_strict_mode is ON.

Andrei Elkin added a comment - 2024-04-10 14:18 - edited salle , the MDEV-32446 concern is actually addressed, and for a single gtid domain it's GTIDs_behind_master = ShowSlaveStatus::Gtid_IO_Pos - @@global.gtid_slave_pos The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_* ) reliably for normal cases of gtid_strict_mode is ON.

Ralf Gebhardt made changes - 2024-04-25 07:30

Fix Version/s		11.6 [ 29515 ]
Fix Version/s	11.5 [ 29506 ]

Ralf Gebhardt made changes - 2024-04-30 06:43

Issue Type

Task [ 3 ]

New Feature [ 2 ]

Michael Widenius added a comment - 2024-05-02 14:04

The problem with GTID's_behind_master is that it is not very useful for delayed slave.

Michael Widenius added a comment - 2024-05-02 14:04 The problem with GTID's_behind_master is that it is not very useful for delayed slave.

Michael Widenius added a comment - 2024-05-02 14:15 - edited

To not cause issues with old applications and still be able to provide
a fix for 10.6, my suggestion is to add a few new slave status
variables to fix the shortcomings the current definition of
Seconds_behind_master.
In 11.6 we could then deprecate Seconds_behind_master and encourage
users to switch to monitor the new variables.

Master_last_event_time Master timestamp of the last event read from the
master. Is NULL when no active replication.
Slave_state_time Master timestamp of event to be executed on the
slave - 1 second. If no events, then same as
Master_last_event_time.
In case of delayed slave, the Master timestamp
of the next to be executed event - 1 second.
This can be interpreted as 'the slave is
a snapshot of the master at this time'.
Master_slave_time_diff Difference between the above two events. NULL
if Master_last_event_time is not set.

The above definitions works both for normal slaves and delayed slaves.

As an example with SQL_DELAY=5 hours:

Master starts sending events from a binary log with old data:

Slave_state_time is first set to 'first event - 1 second'
Slave_state_time will be update before each event is executed and
the Master_slave_time_diff will eventually reach 5 hours at the same
time as Master_last_event_time reaches 'current clock time'.
It will grow up to '5 hours + execution time of last statement' and
then go down to 5 hours, if there are other events waiting to
be executed.

When master stops/pauses sending events;

Slave_state_time will over time go up to be closer to Master_last_event_time
and Master_slave_time_diff will eventually become 0.
(As Slave is fully in sync with Master).

If there are new events coming in after 5 hours,

The Slave_state_time will slowly start growing up from 0 to 5 hours +
execution time for longest delayed item.

More concrete example.

Assuming a new master with no binary log events connecting to a new slave:

IO thread reads an event that was completed at 2023:01:01 01:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 01:01:01
Puts event in Delay queue
Updates Slave_state_time to 2023:01:01 01:01:00

IO thread reads an event that was completed at 2023:01:01 02:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 02:01:01
Puts event in Delay queue
Master_slave_time_diff is now 3601

Clock reaches 2023:01:01 06:01:01:
Master_slave_time_diff is now 60*5+1 = 18001
Event from 2023:01:01 01:01:01 is executed.
Slave_state_time is now set to 2023:01:01 02:01:00 (as this is
1 second less than the next event to be executed).
Master_slave_time_diff is now 60*4+1 = 14401

Instead of having 1 second difference, we could have higher precision
and use 1/1000 of a second difference. The logic is still the same.

Michael Widenius added a comment - 2024-05-02 14:15 - edited To not cause issues with old applications and still be able to provide a fix for 10.6, my suggestion is to add a few new slave status variables to fix the shortcomings the current definition of Seconds_behind_master. In 11.6 we could then deprecate Seconds_behind_master and encourage users to switch to monitor the new variables. Master_last_event_time Master timestamp of the last event read from the master. Is NULL when no active replication. Slave_state_time Master timestamp of event to be executed on the slave - 1 second. If no events, then same as Master_last_event_time. In case of delayed slave, the Master timestamp of the next to be executed event - 1 second. This can be interpreted as 'the slave is a snapshot of the master at this time'. Master_slave_time_diff Difference between the above two events. NULL if Master_last_event_time is not set. The above definitions works both for normal slaves and delayed slaves. As an example with SQL_DELAY=5 hours: Master starts sending events from a binary log with old data: Slave_state_time is first set to 'first event - 1 second' Slave_state_time will be update before each event is executed and the Master_slave_time_diff will eventually reach 5 hours at the same time as Master_last_event_time reaches 'current clock time'. It will grow up to '5 hours + execution time of last statement' and then go down to 5 hours, if there are other events waiting to be executed. When master stops/pauses sending events; Slave_state_time will over time go up to be closer to Master_last_event_time and Master_slave_time_diff will eventually become 0. (As Slave is fully in sync with Master). If there are new events coming in after 5 hours, The Slave_state_time will slowly start growing up from 0 to 5 hours + execution time for longest delayed item. More concrete example. Assuming a new master with no binary log events connecting to a new slave: IO thread reads an event that was completed at 2023:01:01 01:01:01 Slave reads event Updates Master_last_event_time to 2023:01:01 01:01:01 Puts event in Delay queue Updates Slave_state_time to 2023:01:01 01:01:00 IO thread reads an event that was completed at 2023:01:01 02:01:01 Slave reads event Updates Master_last_event_time to 2023:01:01 02:01:01 Puts event in Delay queue Master_slave_time_diff is now 3601 Clock reaches 2023:01:01 06:01:01: Master_slave_time_diff is now 60*5+1 = 18001 Event from 2023:01:01 01:01:01 is executed. Slave_state_time is now set to 2023:01:01 02:01:00 (as this is 1 second less than the next event to be executed). Master_slave_time_diff is now 60*4+1 = 14401 Instead of having 1 second difference, we could have higher precision and use 1/1000 of a second difference. The logic is still the same.

Andrei Elkin added a comment - 2024-05-07 16:45

monty, I really like the idea which In essence it is about presenting "slave is behind" as the subtract between
the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp.
So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2, the queue
would look like [ g_100, g_99, .... g_2 ], and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc).

This is all good also 'cos
A. your method is not interested in the slave local time;
B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier.

Fyi, bnestere.

Andrei Elkin added a comment - 2024-05-07 16:45 monty , I really like the idea which In essence it is about presenting "slave is behind" as the subtract between the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp. So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2 , the queue would look like [ g_100, g_99, .... g_2 ] , and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc). This is all good also 'cos A. your method is not interested in the slave local time; B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier. Fyi, bnestere .

Michael Widenius made changes - 2024-05-14 18:54

Assignee

Andrei Elkin [ elkin ]

Michael Widenius [ monty ]

Michael Widenius made changes - 2024-05-14 18:54

Status

Open [ 1 ]

In Progress [ 3 ]

Brandon Nesterenko made changes - 2024-06-05 14:06

Status

In Progress [ 3 ]

In Testing [ 10301 ]

Brandon Nesterenko made changes - 2024-06-05 14:06

Assignee

Michael Widenius [ monty ]

Ramesh Sivaraman [ JIRAUSER48189 ]

Brandon Nesterenko added a comment - 2024-06-05 14:07

Hi ramesh!

Thanks for stepping in to test this! The latest patch is currently on bb-10.6-monty

Brandon Nesterenko added a comment - 2024-06-05 14:07 Hi ramesh ! Thanks for stepping in to test this! The latest patch is currently on bb-10.6-monty

Sergei Golubchik made changes - 2024-06-12 12:33

Link

This issue is part of TODO-4765 [ TODO-4765 ]

Marko Mäkelä added a comment - 2024-06-18 14:44

This appears to cause a massive amount of MemorySanitizer failures. The following change would seem to fix it:

diff --git a/sql/sql_class.h b/sql/sql_class.h

index af589243fbc..8802752bd26 100644

--- a/sql/sql_class.h

+++ b/sql/sql_class.h

@@ -3168,7 +3168,7 @@ class THD: public THD_count, /* this must be first */

   ulonglong  start_utime, utime_after_lock, utime_after_query;

   /* This can be used by handlers to send signals to the SQL level */

   ulonglong  replication_flags;

-  ulong orig_exec_time;              // Exec time for last read binlog event

+  my_time_t orig_exec_time= 0;       // Exec time for last read binlog event

   // Process indicator

   struct {

/*

diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc

index 747db369d75..2699ec773f8 100644

--- a/sql/log_event_server.cc

+++ b/sql/log_event_server.cc

@@ -1352,7 +1352,7 @@ Query_log_event::Query_log_event(THD* thd_arg, const char* query_arg,

*/

   exec_time= ((thd->slave_thread) ?

               thd->orig_exec_time :

-              (ulong) (my_time(0) - thd_arg->start_time));

+              (my_time(0) - thd_arg->start_time));

/**

     @todo this means that if we have no catalog, then it is replicated

index c2f56ea9771..69ba7cde440 100644

--- a/sql/log_event.h

+++ b/sql/log_event.h

@@ -1297,7 +1297,7 @@ class Log_event

   my_time_t when;

   ulong     when_sec_part;

   /* The number of seconds the query took to run on the master. */

-  ulong exec_time;

+  my_time_t exec_time;

   /* Number of bytes written by write() function */

   size_t data_written;

I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs.

The second hunk identifies where the garbage was being copied to Query_log_event::exec_time. From there it would propagate to the local variable buf in Query_log_event::write().

Marko Mäkelä added a comment - 2024-06-18 14:44 This appears to cause a massive amount of MemorySanitizer failures. The following change would seem to fix it: diff --git a/sql/sql_class.h b/sql/sql_class.h index af589243fbc..8802752bd26 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -3168,7 +3168,7 @@ class THD: public THD_count, /* this must be first */ ulonglong start_utime, utime_after_lock, utime_after_query; /* This can be used by handlers to send signals to the SQL level */ ulonglong replication_flags; - ulong orig_exec_time; // Exec time for last read binlog event + my_time_t orig_exec_time= 0; // Exec time for last read binlog event // Process indicator struct { /* diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 747db369d75..2699ec773f8 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -1352,7 +1352,7 @@ Query_log_event::Query_log_event(THD* thd_arg, const char* query_arg, */ exec_time= ((thd->slave_thread) ? thd->orig_exec_time : - (ulong) (my_time(0) - thd_arg->start_time)); + (my_time(0) - thd_arg->start_time)); /** @todo this means that if we have no catalog, then it is replicated index c2f56ea9771..69ba7cde440 100644 --- a/sql/log_event.h +++ b/sql/log_event.h @@ -1297,7 +1297,7 @@ class Log_event my_time_t when; ulong when_sec_part; /* The number of seconds the query took to run on the master. */ - ulong exec_time; + my_time_t exec_time; /* Number of bytes written by write() function */ size_t data_written; I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs. The second hunk identifies where the garbage was being copied to Query_log_event::exec_time . From there it would propagate to the local variable buf in Query_log_event::write() .

Brandon Nesterenko made changes - 2024-06-18 21:39

Link

This issue causes ~~MDEV-34420~~ [ ~~MDEV-34420~~ ]

Roel Van de Paar added a comment - 2024-06-19 04:21

bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this)
Thank you!

Roel Van de Paar added a comment - 2024-06-19 04:21 bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this) Thank you!

Brandon Nesterenko added a comment - 2024-06-19 18:26

Hi Roel!

It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp.

It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview. I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing).

So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by ~~MDEV-34420~~).

Brandon Nesterenko added a comment - 2024-06-19 18:26 Hi Roel ! It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp . It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview . I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing). So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by MDEV-34420 ).

Brandon Nesterenko made changes - 2024-06-19 18:28

Link

This issue relates to MENT-2095 [ MENT-2095 ]

Roel Van de Paar added a comment - 2024-06-19 23:39 - edited

Thank you bnestere! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing.

For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available?

In the meantime, I have started testing preview-11.6-preview.

Roel Van de Paar added a comment - 2024-06-19 23:39 - edited Thank you bnestere ! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing. For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available? In the meantime, I have started testing preview-11.6-preview.

Marko Mäkelä added a comment - 2024-06-20 12:42

According to monty, the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix.

diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc

index 51ca3b183ec..d279761767a 100644

--- a/sql/rpl_parallel.cc

+++ b/sql/rpl_parallel.cc

@@ -60,7 +60,10 @@ rpt_handle_event(rpl_parallel_thread::queued_event *qev,

   rgi->future_event_relay_log_pos= qev->future_event_relay_log_pos;

   strcpy(rgi->future_event_master_log_name, qev->future_event_master_log_name);

   if (event_can_update_last_master_timestamp(ev))

+  {

     rgi->last_master_timestamp= ev->when + ev->exec_time;

+    thd->orig_exec_time= ev->exec_time;

+  }

   err= apply_event_and_update_pos_for_parallel(ev, thd, rgi);

   rli->executed_entries++;

Marko Mäkelä added a comment - 2024-06-20 12:42 According to monty , the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix. diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 51ca3b183ec..d279761767a 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -60,7 +60,10 @@ rpt_handle_event(rpl_parallel_thread::queued_event *qev, rgi->future_event_relay_log_pos= qev->future_event_relay_log_pos; strcpy(rgi->future_event_master_log_name, qev->future_event_master_log_name); if (event_can_update_last_master_timestamp(ev)) + { rgi->last_master_timestamp= ev->when + ev->exec_time; + thd->orig_exec_time= ev->exec_time; + } err= apply_event_and_update_pos_for_parallel(ev, thd, rgi); rli->executed_entries++;

Federico Razzoli added a comment - 2024-06-21 00:15

Which value are we expected to see in the case below?

The IO thread was stopped 1 hour ago and it's 8:00
The time of the last commit in the relay log is 7:00
The time after the last applied event is 6:00

Federico Razzoli added a comment - 2024-06-21 00:15 Which value are we expected to see in the case below? The IO thread was stopped 1 hour ago and it's 8:00 The time of the last commit in the relay log is 7:00 The time after the last applied event is 6:00

Brandon Nesterenko added a comment - 2024-06-21 15:49

Hi f_razzoli!

Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master).

Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master.
Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads)
Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.

Brandon Nesterenko added a comment - 2024-06-21 15:49 Hi f_razzoli ! Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master). Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master. Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads) Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.

Ralf Gebhardt made changes - 2024-06-25 06:52

Labels

Preview_11.6

Sergei Golubchik made changes - 2024-07-02 16:11

Link

This issue includes ~~MDEV-33526~~ [ ~~MDEV-33526~~ ]

Jira Automation (IT) made changes - 2024-07-04 00:11

Zendesk Related Tickets		201691
Zendesk active tickets		201691

Ramesh Sivaraman added a comment - 2024-07-08 05:25 - edited

ok to push the 11.6 version only (this was tested using preview-11.6-preview).

For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet.

bnestere- Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.

Ramesh Sivaraman added a comment - 2024-07-08 05:25 - edited ok to push the 11.6 version only (this was tested using preview-11.6-preview). For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet. bnestere - Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.

Ramesh Sivaraman made changes - 2024-07-08 05:25

Assignee	Ramesh Sivaraman [ JIRAUSER48189 ]	Brandon Nesterenko [ JIRAUSER48702 ]
Status	In Testing [ 10301 ]	Stalled [ 10000 ]

Brandon Nesterenko made changes - 2024-07-08 17:45

Link

This issue relates to ~~MDEV-11123~~ [ ~~MDEV-11123~~ ]

Brandon Nesterenko made changes - 2024-07-08 18:12

Link

This issue relates to MDEV-13590 [ MDEV-13590 ]

Michael Widenius added a comment - 2024-07-10 12:16

Newest code, all merged to one commit, can be found at bb-11.6-~~MDEV-33856~~-seconds_behind_master

Michael Widenius added a comment - 2024-07-10 12:16 Newest code, all merged to one commit, can be found at bb-11.6- MDEV-33856 -seconds_behind_master

Michael Widenius added a comment - 2024-07-10 12:28

A summary of the commit:

The following new variables replaces 'seconds_behind_master':

Master_last_event_time Timestamp of the last committed event read from the master by the IO thread.
Slave_last_event_time Master timestamp of the last event committed on the slave.
Master_Slave_time_diff The difference of the above two timestamps.

In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS.
One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.

Michael Widenius added a comment - 2024-07-10 12:28 A summary of the commit: The following new variables replaces 'seconds_behind_master': Master_last_event_time Timestamp of the last committed event read from the master by the IO thread. Slave_last_event_time Master timestamp of the last event committed on the slave. Master_Slave_time_diff The difference of the above two timestamps. In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS. One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.

Kristian Nielsen added a comment - 2024-07-22 18:05

Review: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/LQZOIASN4VSOFQJXVDOOO5NCYOELEXOS/

Kristian Nielsen added a comment - 2024-07-22 18:05 Review: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/LQZOIASN4VSOFQJXVDOOO5NCYOELEXOS/

Brandon Nesterenko made changes - 2024-07-23 18:14

Summary

New definition for Seconds_Behind_Master

Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps

Brandon Nesterenko added a comment - 2024-07-24 22:30

Hi knielsen!

I took over for Monty in making the edits for your review. Can you review the changes?

+ e1ee2ed0522...071dcfaeca5 bb-11.6-MDEV-33856-bnestere -> bb-11.6-MDEV-33856-bnestere

Brandon Nesterenko added a comment - 2024-07-24 22:30 Hi knielsen ! I took over for Monty in making the edits for your review. Can you review the changes? + e1ee2ed0522...071dcfaeca5 bb-11.6-MDEV-33856-bnestere -> bb-11.6-MDEV-33856-bnestere

Brandon Nesterenko made changes - 2024-07-24 22:30

Assignee	Brandon Nesterenko [ JIRAUSER48702 ]	Kristian Nielsen [ knielsen ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Kristian Nielsen added a comment - 2024-07-25 15:47

I had a quick look, looks ok, thanks.

- Kristian.

Kristian Nielsen added a comment - 2024-07-25 15:47 I had a quick look, looks ok, thanks. - Kristian.

Kristian Nielsen made changes - 2024-07-25 15:47

Assignee	Kristian Nielsen [ knielsen ]	Brandon Nesterenko [ JIRAUSER48702 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Brandon Nesterenko added a comment - 2024-07-31 12:25

Pushed on Monty's behalf into 11.6 as 25b5c63905682a0e38d8ed86170d98be6ddb8cf8

Brandon Nesterenko added a comment - 2024-07-31 12:25 Pushed on Monty's behalf into 11.6 as 25b5c63905682a0e38d8ed86170d98be6ddb8cf8

Brandon Nesterenko made changes - 2024-07-31 12:25

Component/s		Information Schema [ 14413 ]
Component/s		Replication [ 10100 ]
Fix Version/s		11.6.0 [ 29839 ]
Fix Version/s	11.6 [ 29515 ]
Assignee	Brandon Nesterenko [ JIRAUSER48702 ]	Michael Widenius [ monty ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Brandon Nesterenko made changes - 2024-07-31 12:34

Description

The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/

The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

The suggested changes are:
* Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
* Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

As an example with SQL_DELAY=5 hours:
- IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
- The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

In this case the Seconds_Behind_master will be 5 hours.

The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
* Without and without delayed replication
* When the master has been down and is just coming up.
* If the master has been 'idle' for some time.
* When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.

The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/

The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

The suggested changes are:
* Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
* Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

As an example with SQL_DELAY=5 hours:
- IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
- The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

In this case the Seconds_Behind_master will be 5 hours.

The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
* Without and without delayed replication
* When the master has been down and is just coming up.
* If the master has been 'idle' for some time.
* When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.

Note too that the final commit also adds in the table information_schema.slave_status (~~MDEV-33526~~), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.

Brandon Nesterenko made changes - 2024-07-31 12:42

Link

This issue relates to ~~MDEV-32172~~ [ ~~MDEV-32172~~ ]

Brandon Nesterenko made changes - 2024-07-31 18:35

Link

This issue relates to ~~MDEV-32172~~ [ ~~MDEV-32172~~ ]

Brandon Nesterenko made changes - 2024-07-31 18:35

Link

This issue includes ~~MDEV-32172~~ [ ~~MDEV-32172~~ ]

Brandon Nesterenko made changes - 2024-08-13 17:08

Link

This issue causes MDEV-34752 [ MDEV-34752 ]

Brandon Nesterenko made changes - 2024-08-16 20:00

Link

This issue causes ~~MDEV-34765~~ [ ~~MDEV-34765~~ ]

Brandon Nesterenko made changes - 2024-08-27 21:17

Link

This issue causes ~~MDEV-34354~~ [ ~~MDEV-34354~~ ]

Ralf Gebhardt made changes - 2024-10-01 20:52

Link

This issue relates to MENT-2141 [ MENT-2141 ]

Ralf Gebhardt made changes - 2024-10-31 10:34

Link

This issue is part of TODO-4980 [ TODO-4980 ]

Brandon Nesterenko made changes - 2024-12-12 16:35

Link

This issue relates to TODO-5070 [ TODO-5070 ]

Andrei Elkin made changes - 2024-12-19 13:34

Link

This issue causes ~~MDEV-35693~~ [ ~~MDEV-35693~~ ]

Brandon Nesterenko made changes - 2025-01-15 17:05

Link

This issue relates to MDEV-35858 [ MDEV-35858 ]

Jimmy Hú made changes - 2025-01-16 19:40

Link

This issue causes ~~MDEV-35693~~ [ ~~MDEV-35693~~ ]

Susil Behera made changes - 2025-01-21 14:17

Link

This issue relates to MDEV-34843 [ MDEV-34843 ]

Brandon Nesterenko made changes - 2025-01-25 22:59

Link

This issue causes ~~MDEV-35939~~ [ ~~MDEV-35939~~ ]

Jimmy Hú made changes - 2025-01-31 20:31

Link

This issue causes MDEV-36001 [ MDEV-36001 ]

People

Assignee:: Michael Widenius

Reporter:: Michael Widenius

Votes:: 3 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2024-04-08 15:54

Updated:: 2025-03-13 11:40

Resolved:: 2024-07-31 12:25

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration