Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31895

Report a Replica's Time Difference with its Primary

Details

    • New Feature
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • N/A
    • Replication
    • None

    Description

      Extend SHOW SLAVE STATUS, to display the variable mi->clock_diff_with_master.

      Additionally, MTR tests which use Seconds_Behind_Master should be updated to adjust its value based on this variable. This will remove the need for the debug point "negate_clock_diff_with_master", and tests which rely on have_debug.inc for that purpose can be freed from that dependency.

      Attachments

        Issue Links

          Activity

            With more thought, I wonder if presenting this value may be problematic. I originally encountered this issue when an expected equivalence

            Seconds_Behind_Master + SQL_Remaining_Delay == SQL_Delay
            

            didn't hold true, because the mi->clock_diff_with_master variable is used to set Seconds_Behind_Master.

            This value is calculated by comparing the primary's system clock, i.e. with

            SELECT UNIX_TIMESTAMP()
            

            against the current clock of the slave, with the granularity of seconds.

            The issue in the MTR tests, was that the master could be queried for its timestamp just before ticking to the next second, and then by the time the replica checks its current timestamp, it has reached the next second. So mi->clock_diff_with_master would compute to "1", despite the master and slave using the same system clock.

            In a real scenario, this seems like it could be problematic, as the replica and primary could be very close (within the NTP standard), yet we would report one second (or potentially more if the system scheduler is messed up), and falsely scare admins.

            Elkin, knielsen do you have any thoughts?

            bnestere Brandon Nesterenko added a comment - With more thought, I wonder if presenting this value may be problematic. I originally encountered this issue when an expected equivalence Seconds_Behind_Master + SQL_Remaining_Delay == SQL_Delay didn't hold true, because the mi->clock_diff_with_master variable is used to set Seconds_Behind_Master. This value is calculated by comparing the primary's system clock, i.e. with SELECT UNIX_TIMESTAMP() against the current clock of the slave, with the granularity of seconds. The issue in the MTR tests, was that the master could be queried for its timestamp just before ticking to the next second, and then by the time the replica checks its current timestamp, it has reached the next second. So mi->clock_diff_with_master would compute to "1", despite the master and slave using the same system clock. In a real scenario, this seems like it could be problematic, as the replica and primary could be very close (within the NTP standard), yet we would report one second (or potentially more if the system scheduler is messed up), and falsely scare admins. Elkin , knielsen do you have any thoughts?

            I think your concern is very real that exposing additional information like clock_diff_with_master can easily lead to confusion. If that's enough to decide not to expose it, I'm not sure though.

            Having followed replication closely for many years, the "Seconds_behind_master" is an endless source of confusion, from the very nature of the problem it's supposed to describe. Every few years someone comes up with a new corner case they want it to mean something differently, or a new idea how to tune the value. The end result is of course that the meaning doesn't become any clearer or less confusing, only more complicated to understand.

            I have long since given up on "fixing" the Seconds_behind_master, I think there will always be someone unhappy about how it works no matter how it's changed.

            knielsen Kristian Nielsen added a comment - I think your concern is very real that exposing additional information like clock_diff_with_master can easily lead to confusion. If that's enough to decide not to expose it, I'm not sure though. Having followed replication closely for many years, the "Seconds_behind_master" is an endless source of confusion, from the very nature of the problem it's supposed to describe. Every few years someone comes up with a new corner case they want it to mean something differently, or a new idea how to tune the value. The end result is of course that the meaning doesn't become any clearer or less confusing, only more complicated to understand. I have long since given up on "fixing" the Seconds_behind_master, I think there will always be someone unhappy about how it works no matter how it's changed.

            With MDEV-33856, this now seems unnecessary.

            bnestere Brandon Nesterenko added a comment - With MDEV-33856 , this now seems unnecessary.

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.