[MDEV-31895] Report a Replica's Time Difference with its Primary - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Fix Version/s: N/A
Component/s: Replication
Labels:
None

Description

Extend SHOW SLAVE STATUS, to display the variable mi->clock_diff_with_master.

Additionally, MTR tests which use Seconds_Behind_Master should be updated to adjust its value based on this variable. This will remove the need for the debug point "negate_clock_diff_with_master", and tests which rely on have_debug.inc for that purpose can be freed from that dependency.

Attachments

Issue Links

relates to

MDEV-16091 Seconds_Behind_Master spikes to millions of seconds

Closed

MDEV-30619 Parallel Slave SQL Thread Can Update Seconds_Behind_Master with Active Workers

Closed

Activity

Ascending order - Click to sort in descending order

Brandon Nesterenko added a comment - 2024-02-06 14:24

With more thought, I wonder if presenting this value may be problematic. I originally encountered this issue when an expected equivalence

Seconds_Behind_Master + SQL_Remaining_Delay == SQL_Delay

didn't hold true, because the mi->clock_diff_with_master variable is used to set Seconds_Behind_Master.

This value is calculated by comparing the primary's system clock, i.e. with

SELECT UNIX_TIMESTAMP()

against the current clock of the slave, with the granularity of seconds.

The issue in the MTR tests, was that the master could be queried for its timestamp just before ticking to the next second, and then by the time the replica checks its current timestamp, it has reached the next second. So mi->clock_diff_with_master would compute to "1", despite the master and slave using the same system clock.

In a real scenario, this seems like it could be problematic, as the replica and primary could be very close (within the NTP standard), yet we would report one second (or potentially more if the system scheduler is messed up), and falsely scare admins.

Elkin, knielsen do you have any thoughts?

Brandon Nesterenko added a comment - 2024-02-06 14:24 With more thought, I wonder if presenting this value may be problematic. I originally encountered this issue when an expected equivalence Seconds_Behind_Master + SQL_Remaining_Delay == SQL_Delay didn't hold true, because the mi->clock_diff_with_master variable is used to set Seconds_Behind_Master. This value is calculated by comparing the primary's system clock, i.e. with SELECT UNIX_TIMESTAMP() against the current clock of the slave, with the granularity of seconds. The issue in the MTR tests, was that the master could be queried for its timestamp just before ticking to the next second, and then by the time the replica checks its current timestamp, it has reached the next second. So mi->clock_diff_with_master would compute to "1", despite the master and slave using the same system clock. In a real scenario, this seems like it could be problematic, as the replica and primary could be very close (within the NTP standard), yet we would report one second (or potentially more if the system scheduler is messed up), and falsely scare admins. Elkin , knielsen do you have any thoughts?

Kristian Nielsen added a comment - 2024-02-07 17:01

I think your concern is very real that exposing additional information like clock_diff_with_master can easily lead to confusion. If that's enough to decide not to expose it, I'm not sure though.

Having followed replication closely for many years, the "Seconds_behind_master" is an endless source of confusion, from the very nature of the problem it's supposed to describe. Every few years someone comes up with a new corner case they want it to mean something differently, or a new idea how to tune the value. The end result is of course that the meaning doesn't become any clearer or less confusing, only more complicated to understand.

I have long since given up on "fixing" the Seconds_behind_master, I think there will always be someone unhappy about how it works no matter how it's changed.

Kristian Nielsen added a comment - 2024-02-07 17:01 I think your concern is very real that exposing additional information like clock_diff_with_master can easily lead to confusion. If that's enough to decide not to expose it, I'm not sure though. Having followed replication closely for many years, the "Seconds_behind_master" is an endless source of confusion, from the very nature of the problem it's supposed to describe. Every few years someone comes up with a new corner case they want it to mean something differently, or a new idea how to tune the value. The end result is of course that the meaning doesn't become any clearer or less confusing, only more complicated to understand. I have long since given up on "fixing" the Seconds_behind_master, I think there will always be someone unhappy about how it works no matter how it's changed.

Brandon Nesterenko added a comment - 2024-08-06 13:56

With ~~MDEV-33856~~, this now seems unnecessary.

Brandon Nesterenko added a comment - 2024-08-06 13:56 With MDEV-33856 , this now seems unnecessary.

People

Assignee:: Brandon Nesterenko

Reporter:: Brandon Nesterenko

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2023-08-10 13:46

Updated:: 2024-08-06 13:56

Resolved:: 2024-08-06 13:56

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server