The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.
The suggested changes are:
Parallel execution uses the time AFTER execution while 'normal' replication used the value BEFORE to calculate Seconds_Behind_Master. This should be changed to always use AFTER.
Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: Newest event completion time from master (when writing to the relay log) - Completion time for last committed event from the master. If the slave is 'idle' then the Seconds_Behind_Master should be 0.
As an example with SQL_DELAY=5 hours:
IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01
In this case the Seconds_Behind_master will be 5 hours.
The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
Without and without delayed replication
When the master has been down and is just coming up.
If the master has been 'idle' for some time.
When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting Newest event completion time from master as 'master current time' when connecting.
Note too that the final commit also adds in the table information_schema.slave_status (MDEV-33526), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.
Attachments
Issue Links
causes
MDEV-34354Shows replication time difference in master_last_event_time after setting MASTER_DELAY > 0 in chain replication
Closed
MDEV-34420information_schema.slave_status Bad String Cut-off
Closed
MDEV-34752Create alias replica_status for INFORMATION_SCHEMA.slave_status
Open
MDEV-34765rpl.master_last_event_time_stmt fails with Result Length Mismatch
Closed
MDEV-35939rpl.rpl_parallel_sbm: "Slave_last_event_time is not equal to Master_last_event_time"
Closed
MDEV-36001DBUG_ASSERT(master_timestamp >= slave_timestamp) fails in rpl_gtid_crash
Open
includes
MDEV-32172introspect server's replication settings from SQL stored routines
Closed
MDEV-33526Create IS.slave_status table as alias for show replica status command
Closed
relates to
MDEV-11123`Seconds_Behind_Master` is not accessible through `information_schema`
Closed
MDEV-13590Convert SHOW master/slave statments to information_schema plugin
Open
MDEV-34843Seconds_Behind_Master is not zero after slave syncs up with master
Open
MDEV-35858rpl.rpl_parallel_sbm: Slave_last_event_time is not equal to Master_last_event_time
Another solutions for how we could adjust times when calculating Seconds_Behind_master:
Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
Last commit on slave was 2023-01-01 00:00:00
Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.
The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.
Michael Widenius
added a comment - Another solutions for how we could adjust times when calculating Seconds_Behind_master:
Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
Last commit on slave was 2023-01-01 00:00:00
Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.
The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.
Please consider also higher priority for MDEV-32446
In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master.
A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind.
GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well.
Consider an example with:
sesconds_behind_master = 2753
GTIDs_behind_master = 1
Alexander Keremidarski
added a comment - - edited Please consider also higher priority for MDEV-32446
In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master.
A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind.
GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well.
Consider an example with:
sesconds_behind_master = 2753
GTIDs_behind_master = 1
The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_*) reliably for normal cases of gtid_strict_mode is ON.
Andrei Elkin
added a comment - - edited salle , the MDEV-32446 concern is actually addressed, and for a single gtid domain it's
GTIDs_behind_master = ShowSlaveStatus::Gtid_IO_Pos - @@global.gtid_slave_pos
The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_* ) reliably for normal cases of gtid_strict_mode is ON.
To not cause issues with old applications and still be able to provide
a fix for 10.6, my suggestion is to add a few new slave status
variables to fix the shortcomings the current definition of
Seconds_behind_master.
In 11.6 we could then deprecate Seconds_behind_master and encourage
users to switch to monitor the new variables.
Master_last_event_time Master timestamp of the last event read from the
master. Is NULL when no active replication.
Slave_state_time Master timestamp of event to be executed on the
slave - 1 second. If no events, then same as
Master_last_event_time.
In case of delayed slave, the Master timestamp
of the next to be executed event - 1 second.
This can be interpreted as 'the slave is
a snapshot of the master at this time'.
Master_slave_time_diff Difference between the above two events. NULL
if Master_last_event_time is not set.
The above definitions works both for normal slaves and delayed slaves.
As an example with SQL_DELAY=5 hours:
Master starts sending events from a binary log with old data:
Slave_state_time is first set to 'first event - 1 second'
Slave_state_time will be update before each event is executed and
the Master_slave_time_diff will eventually reach 5 hours at the same
time as Master_last_event_time reaches 'current clock time'.
It will grow up to '5 hours + execution time of last statement' and
then go down to 5 hours, if there are other events waiting to
be executed.
When master stops/pauses sending events;
Slave_state_time will over time go up to be closer to Master_last_event_time
and Master_slave_time_diff will eventually become 0.
(As Slave is fully in sync with Master).
If there are new events coming in after 5 hours,
The Slave_state_time will slowly start growing up from 0 to 5 hours +
execution time for longest delayed item.
More concrete example.
Assuming a new master with no binary log events connecting to a new slave:
IO thread reads an event that was completed at 2023:01:01 01:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 01:01:01
Puts event in Delay queue
Updates Slave_state_time to 2023:01:01 01:01:00
IO thread reads an event that was completed at 2023:01:01 02:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 02:01:01
Puts event in Delay queue
Master_slave_time_diff is now 3601
Clock reaches 2023:01:01 06:01:01:
Master_slave_time_diff is now 60*5+1 = 18001
Event from 2023:01:01 01:01:01 is executed.
Slave_state_time is now set to 2023:01:01 02:01:00 (as this is
1 second less than the next event to be executed).
Master_slave_time_diff is now 60*4+1 = 14401
Instead of having 1 second difference, we could have higher precision
and use 1/1000 of a second difference. The logic is still the same.
Michael Widenius
added a comment - - edited To not cause issues with old applications and still be able to provide
a fix for 10.6, my suggestion is to add a few new slave status
variables to fix the shortcomings the current definition of
Seconds_behind_master.
In 11.6 we could then deprecate Seconds_behind_master and encourage
users to switch to monitor the new variables.
Master_last_event_time Master timestamp of the last event read from the
master. Is NULL when no active replication.
Slave_state_time Master timestamp of event to be executed on the
slave - 1 second. If no events, then same as
Master_last_event_time.
In case of delayed slave, the Master timestamp
of the next to be executed event - 1 second.
This can be interpreted as 'the slave is
a snapshot of the master at this time'.
Master_slave_time_diff Difference between the above two events. NULL
if Master_last_event_time is not set.
The above definitions works both for normal slaves and delayed slaves.
As an example with SQL_DELAY=5 hours:
Master starts sending events from a binary log with old data:
Slave_state_time is first set to 'first event - 1 second'
Slave_state_time will be update before each event is executed and
the Master_slave_time_diff will eventually reach 5 hours at the same
time as Master_last_event_time reaches 'current clock time'.
It will grow up to '5 hours + execution time of last statement' and
then go down to 5 hours, if there are other events waiting to
be executed.
When master stops/pauses sending events;
Slave_state_time will over time go up to be closer to Master_last_event_time
and Master_slave_time_diff will eventually become 0.
(As Slave is fully in sync with Master).
If there are new events coming in after 5 hours,
The Slave_state_time will slowly start growing up from 0 to 5 hours +
execution time for longest delayed item.
More concrete example.
Assuming a new master with no binary log events connecting to a new slave:
IO thread reads an event that was completed at 2023:01:01 01:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 01:01:01
Puts event in Delay queue
Updates Slave_state_time to 2023:01:01 01:01:00
IO thread reads an event that was completed at 2023:01:01 02:01:01
Slave reads event
Updates Master_last_event_time to 2023:01:01 02:01:01
Puts event in Delay queue
Master_slave_time_diff is now 3601
Clock reaches 2023:01:01 06:01:01:
Master_slave_time_diff is now 60*5+1 = 18001
Event from 2023:01:01 01:01:01 is executed.
Slave_state_time is now set to 2023:01:01 02:01:00 (as this is
1 second less than the next event to be executed).
Master_slave_time_diff is now 60*4+1 = 14401
Instead of having 1 second difference, we could have higher precision
and use 1/1000 of a second difference. The logic is still the same.
monty, I really like the idea which In essence it is about presenting "slave is behind" as the subtract between
the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp.
So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2, the queue
would look like [ g_100, g_99, .... g_2 ], and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc).
This is all good also 'cos
A. your method is not interested in the slave local time;
B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier.
Andrei Elkin
added a comment - monty , I really like the idea which In essence it is about presenting "slave is behind" as the subtract between
the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp.
So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2 , the queue
would look like [ g_100, g_99, .... g_2 ] , and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc).
This is all good also 'cos
A. your method is not interested in the slave local time;
B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier.
Fyi, bnestere .
@todo this means that if we have no catalog, then it is replicated
index c2f56ea9771..69ba7cde440 100644
--- a/sql/log_event.h
+++ b/sql/log_event.h
@@ -1297,7 +1297,7 @@ class Log_event
my_time_t when;
ulong when_sec_part;
/* The number of seconds the query took to run on the master. */
- ulong exec_time;
+ my_time_t exec_time;
/* Number of bytes written by write() function */
size_t data_written;
I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs.
The second hunk identifies where the garbage was being copied to Query_log_event::exec_time. From there it would propagate to the local variable buf in Query_log_event::write().
Marko Mäkelä
added a comment - This appears to cause a massive amount of MemorySanitizer failures. The following change would seem to fix it:
diff --git a/sql/sql_class.h b/sql/sql_class.h
index af589243fbc..8802752bd26 100644
--- a/sql/sql_class.h
+++ b/sql/sql_class.h
@@ -3168,7 +3168,7 @@ class THD: public THD_count, /* this must be first */
ulonglong start_utime, utime_after_lock, utime_after_query;
/* This can be used by handlers to send signals to the SQL level */
ulonglong replication_flags;
- ulong orig_exec_time; // Exec time for last read binlog event
+ my_time_t orig_exec_time= 0; // Exec time for last read binlog event
// Process indicator
struct {
/*
diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
index 747db369d75..2699ec773f8 100644
--- a/sql/log_event_server.cc
+++ b/sql/log_event_server.cc
@@ -1352,7 +1352,7 @@ Query_log_event::Query_log_event(THD* thd_arg, const char* query_arg,
*/
exec_time= ((thd->slave_thread) ?
thd->orig_exec_time :
- (ulong) (my_time(0) - thd_arg->start_time));
+ (my_time(0) - thd_arg->start_time));
/**
@todo this means that if we have no catalog, then it is replicated
index c2f56ea9771..69ba7cde440 100644
--- a/sql/log_event.h
+++ b/sql/log_event.h
@@ -1297,7 +1297,7 @@ class Log_event
my_time_t when;
ulong when_sec_part;
/* The number of seconds the query took to run on the master. */
- ulong exec_time;
+ my_time_t exec_time;
/* Number of bytes written by write() function */
size_t data_written;
I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs.
The second hunk identifies where the garbage was being copied to Query_log_event::exec_time . From there it would propagate to the local variable buf in Query_log_event::write() .
bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this)
Thank you!
Roel Van de Paar
added a comment - bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this)
Thank you!
It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp.
It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview. I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing).
So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by MDEV-34420).
Brandon Nesterenko
added a comment - Hi Roel !
It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp .
It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview . I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing).
So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by MDEV-34420 ).
Thank you bnestere! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing.
For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available?
In the meantime, I have started testing preview-11.6-preview.
Roel Van de Paar
added a comment - - edited Thank you bnestere ! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing.
For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available?
In the meantime, I have started testing preview-11.6-preview.
According to monty, the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix.
Marko Mäkelä
added a comment - According to monty , the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix.
diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc
index 51ca3b183ec..d279761767a 100644
--- a/sql/rpl_parallel.cc
+++ b/sql/rpl_parallel.cc
@@ -60,7 +60,10 @@ rpt_handle_event(rpl_parallel_thread::queued_event *qev,
rgi->future_event_relay_log_pos= qev->future_event_relay_log_pos;
strcpy(rgi->future_event_master_log_name, qev->future_event_master_log_name);
if (event_can_update_last_master_timestamp(ev))
+ {
rgi->last_master_timestamp= ev->when + ev->exec_time;
+ thd->orig_exec_time= ev->exec_time;
+ }
err= apply_event_and_update_pos_for_parallel(ev, thd, rgi);
rli->executed_entries++;
Which value are we expected to see in the case below?
The IO thread was stopped 1 hour ago and it's 8:00
The time of the last commit in the relay log is 7:00
The time after the last applied event is 6:00
Federico Razzoli
added a comment - Which value are we expected to see in the case below?
The IO thread was stopped 1 hour ago and it's 8:00
The time of the last commit in the relay log is 7:00
The time after the last applied event is 6:00
Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master).
Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master.
Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads)
Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.
Brandon Nesterenko
added a comment - Hi f_razzoli !
Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master).
Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master.
Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads)
Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.
ok to push the 11.6 version only (this was tested using preview-11.6-preview).
For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet.
bnestere- Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.
Ramesh Sivaraman
added a comment - - edited ok to push the 11.6 version only (this was tested using preview-11.6-preview).
For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet.
bnestere - Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.
The following new variables replaces 'seconds_behind_master':
Master_last_event_time Timestamp of the last committed event read from the master by the IO thread.
Slave_last_event_time Master timestamp of the last event committed on the slave.
Master_Slave_time_diff The difference of the above two timestamps.
In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS.
One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.
Michael Widenius
added a comment - A summary of the commit:
The following new variables replaces 'seconds_behind_master':
Master_last_event_time Timestamp of the last committed event read from the master by the IO thread.
Slave_last_event_time Master timestamp of the last event committed on the slave.
Master_Slave_time_diff The difference of the above two timestamps.
In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS.
One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.
Kristian Nielsen
added a comment - Review: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/LQZOIASN4VSOFQJXVDOOO5NCYOELEXOS/
Brandon Nesterenko
added a comment - Hi knielsen !
I took over for Monty in making the edits for your review. Can you review the changes?
+ e1ee2ed0522...071dcfaeca5 bb-11.6-MDEV-33856-bnestere -> bb-11.6-MDEV-33856-bnestere
Another solutions for how we could adjust times when calculating Seconds_Behind_master:
Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
Last commit on slave was 2023-01-01 00:00:00
Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.
The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.