Details
-
New Feature
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
Description
The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at:
https://mariadb.com/kb/en/delayed-replication/
The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.
The suggested changes are:
- Parallel execution uses the time AFTER execution while 'normal' replication used the value BEFORE to calculate Seconds_Behind_Master. This should be changed to always use AFTER.
- Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: Newest event completion time from master (when writing to the relay log) - Completion time for last committed event from the master. If the slave is 'idle' then the Seconds_Behind_Master should be 0.
As an example with SQL_DELAY=5 hours:
- IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
- The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01
In this case the Seconds_Behind_master will be 5 hours.
The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
- Without and without delayed replication
- When the master has been down and is just coming up.
- If the master has been 'idle' for some time.
- When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting Newest event completion time from master as 'master current time' when connecting.
Note too that the final commit also adds in the table information_schema.slave_status (MDEV-33526), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.
Attachments
Issue Links
- causes
-
MDEV-34354 Shows replication time difference in master_last_event_time after setting MASTER_DELAY > 0 in chain replication
-
- Closed
-
-
MDEV-34420 information_schema.slave_status Bad String Cut-off
-
- Closed
-
-
MDEV-34752 Create alias replica_status for INFORMATION_SCHEMA.slave_status
-
- Open
-
-
MDEV-34765 rpl.master_last_event_time_stmt fails with Result Length Mismatch
-
- Closed
-
-
MDEV-35939 rpl.rpl_parallel_sbm: "Slave_last_event_time is not equal to Master_last_event_time"
-
- Closed
-
-
MDEV-36001 DBUG_ASSERT(master_timestamp >= slave_timestamp) fails in rpl_gtid_crash
-
- Open
-
- includes
-
MDEV-32172 introspect server's replication settings from SQL stored routines
-
- Closed
-
-
MDEV-33526 Create IS.slave_status table as alias for show replica status command
-
- Closed
-
- relates to
-
MDEV-11123 `Seconds_Behind_Master` is not accessible through `information_schema`
-
- Closed
-
-
MDEV-13590 Convert SHOW master/slave statments to information_schema plugin
-
- Open
-
-
MDEV-34843 Seconds_Behind_Master is not zero after slave syncs up with master
-
- Open
-
-
MDEV-35858 rpl.rpl_parallel_sbm: Slave_last_event_time is not equal to Master_last_event_time
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Key |
|
|
Project | MariaDB Enterprise [ 11500 ] | MariaDB Server [ 10000 ] |
Fix Version/s | 11.5 [ 29506 ] |
Assignee | Andrei Elkin [ elkin ] |
Description |
The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at: https://mariadb.com/kb/en/delayed-replication/ The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries. The suggested changes are: * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*. * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0. As an example with SQL_DELAY=5 hours: - IO thread reads an event that was completed at 2023:01:01 06:01:01 - The SQL thread commits an event that was originally committed at 2023:01:01 01:01:01. In this case the Seconds_Behind_master is 5 hours. The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases: * Without and without delayed replication * When the master has been down and is just coming up. * If the master has been 'idle' for some time. * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting. |
The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at: https://mariadb.com/kb/en/delayed-replication/ The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries. The suggested changes are: * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*. * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0. As an example with SQL_DELAY=5 hours: - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01 In this case the Seconds_Behind_master will be 5 hours. The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases: * Without and without delayed replication * When the master has been down and is just coming up. * If the master has been 'idle' for some time. * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting. |
Fix Version/s | 11.6 [ 29515 ] | |
Fix Version/s | 11.5 [ 29506 ] |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Assignee | Andrei Elkin [ elkin ] | Michael Widenius [ monty ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | In Testing [ 10301 ] |
Assignee | Michael Widenius [ monty ] | Ramesh Sivaraman [ JIRAUSER48189 ] |
Link | This issue is part of TODO-4765 [ TODO-4765 ] |
Link |
This issue causes |
Link | This issue relates to MENT-2095 [ MENT-2095 ] |
Labels | Preview_11.6 |
Link |
This issue includes |
Zendesk Related Tickets | 201691 | |
Zendesk active tickets | 201691 |
Assignee | Ramesh Sivaraman [ JIRAUSER48189 ] | Brandon Nesterenko [ JIRAUSER48702 ] |
Status | In Testing [ 10301 ] | Stalled [ 10000 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-13590 [ MDEV-13590 ] |
Summary | New definition for Seconds_Behind_Master | Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps |
Assignee | Brandon Nesterenko [ JIRAUSER48702 ] | Kristian Nielsen [ knielsen ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
Assignee | Kristian Nielsen [ knielsen ] | Brandon Nesterenko [ JIRAUSER48702 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Component/s | Information Schema [ 14413 ] | |
Component/s | Replication [ 10100 ] | |
Fix Version/s | 11.6.0 [ 29839 ] | |
Fix Version/s | 11.6 [ 29515 ] | |
Assignee | Brandon Nesterenko [ JIRAUSER48702 ] | Michael Widenius [ monty ] |
Resolution | Fixed [ 1 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Description |
The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at: https://mariadb.com/kb/en/delayed-replication/ The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries. The suggested changes are: * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*. * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0. As an example with SQL_DELAY=5 hours: - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01 In this case the Seconds_Behind_master will be 5 hours. The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases: * Without and without delayed replication * When the master has been down and is just coming up. * If the master has been 'idle' for some time. * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting. |
The current definition of Seconds_Behind_Master is both complex and confusing.
This can be seen in the documentation of the variable at: https://mariadb.com/kb/en/delayed-replication/ The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries. The suggested changes are: * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*. * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0. As an example with SQL_DELAY=5 hours: - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01 In this case the Seconds_Behind_master will be 5 hours. The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases: * Without and without delayed replication * When the master has been down and is just coming up. * If the master has been 'idle' for some time. * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting. Note too that the final commit also adds in the table information_schema.slave_status ( |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue includes |
Link | This issue causes MDEV-34752 [ MDEV-34752 ] |
Link |
This issue causes |
Link |
This issue causes |
Link | This issue relates to MENT-2141 [ MENT-2141 ] |
Link | This issue is part of TODO-4980 [ TODO-4980 ] |
Link | This issue relates to TODO-5070 [ TODO-5070 ] |
Link |
This issue causes |
Link | This issue relates to MDEV-35858 [ MDEV-35858 ] |
Link |
This issue causes |
Link | This issue relates to MDEV-34843 [ MDEV-34843 ] |
Link |
This issue causes |
Link | This issue causes MDEV-36001 [ MDEV-36001 ] |
Another solutions for how we could adjust times when calculating Seconds_Behind_master:
Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
Last commit on slave was 2023-01-01 00:00:00
Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.
The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.