Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33856

Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps

Details

    Description

      The current definition of Seconds_Behind_Master is both complex and confusing.
      This can be seen in the documentation of the variable at:
      https://mariadb.com/kb/en/delayed-replication/

      The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

      The suggested changes are:

      • Parallel execution uses the time AFTER execution while 'normal' replication used the value BEFORE to calculate Seconds_Behind_Master. This should be changed to always use AFTER.
      • Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: Newest event completion time from master (when writing to the relay log) - Completion time for last committed event from the master. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

      As an example with SQL_DELAY=5 hours:

      • IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
      • The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

      In this case the Seconds_Behind_master will be 5 hours.

      The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:

      • Without and without delayed replication
      • When the master has been down and is just coming up.
      • If the master has been 'idle' for some time.
      • When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting Newest event completion time from master as 'master current time' when connecting.

      Note too that the final commit also adds in the table information_schema.slave_status (MDEV-33526), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.

      Attachments

        Issue Links

          Activity

            monty Michael Widenius created issue -
            monty Michael Widenius made changes -
            Field Original Value New Value
            Key MENT-2063 MDEV-33856
            Project MariaDB Enterprise [ 11500 ] MariaDB Server [ 10000 ]
            monty Michael Widenius made changes -
            Fix Version/s 11.5 [ 29506 ]
            Elkin Andrei Elkin made changes -
            Assignee Andrei Elkin [ elkin ]

            Another solutions for how we could adjust times when calculating Seconds_Behind_master:

            Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min)
            Last commit on slave was 2023-01-01 00:00:00
            Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave.
            currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state.
            However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00
            The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00.
            From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query.

            The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time.
            The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.

            monty Michael Widenius added a comment - Another solutions for how we could adjust times when calculating Seconds_Behind_master: Assuming the master has done nothing for a while and a delayed-slave is idle (SQL_delay is 60 min) Last commit on slave was 2023-01-01 00:00:00 Then master does a commit at 2023-01-01 12:00:01 and sends binary log to slave. currently, and in the proposal, the slave would be 12 hours delayed while the query is in delayed-state. However, if we look at 'when data on the master was identical as it is now on the slave', that was between 2023-01:00 00:00:00 - 2023-01-01 12:00:00 The current code and the proposal says that we should use lower value 2023-01:00 00:00:00, but logically it would be better to use the upper one 2023-01-01 12:00:00. From the observers point of view this would mean that instead of sayings that the slave is initially 12 hours delayed, it would instead initially be 1 second delayed and growing up to 60 min + time to execute the query. The nice thing with this approach is the Seconds_behind_master does not jump from 0 to SQL_delay in one go, but will increase one seconds at a time. The disadvantage is that to calculate Seconds_behind_master we would need to know the time of the 'next event after the current one' to get the higher value of the 'gap'.
            monty Michael Widenius made changes -
            Description The current definition of Seconds_Behind_Master is both complex and confusing.
            This can be seen in the documentation of the variable at:
            https://mariadb.com/kb/en/delayed-replication/

            The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

            The suggested changes are:
            * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
            * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

            As an example with SQL_DELAY=5 hours:
            - IO thread reads an event that was completed at 2023:01:01 06:01:01
            - The SQL thread commits an event that was originally committed at 2023:01:01 01:01:01.

            In this case the Seconds_Behind_master is 5 hours.

            The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
            * Without and without delayed replication
            * When the master has been down and is just coming up.
            * If the master has been 'idle' for some time.
            * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.
            The current definition of Seconds_Behind_Master is both complex and confusing.
            This can be seen in the documentation of the variable at:
            https://mariadb.com/kb/en/delayed-replication/

            The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

            The suggested changes are:
            * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
            * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

            As an example with SQL_DELAY=5 hours:
            - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
            - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

            In this case the Seconds_Behind_master will be 5 hours.

            The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
            * Without and without delayed replication
            * When the master has been down and is just coming up.
            * If the master has been 'idle' for some time.
            * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.
            salle Alexander Keremidarski added a comment - - edited

            Please consider also higher priority for MDEV-32446
            In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master.
            A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind.
            GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well.
            Consider an example with:
            sesconds_behind_master = 2753
            GTIDs_behind_master = 1

            salle Alexander Keremidarski added a comment - - edited Please consider also higher priority for MDEV-32446 In my opinion "N transactions behind master" can be very helpful along with seconds_behind_master. A DBA should be less concerned about long replication lag if there is only a single or couple of long transactions causing it while it can be a concern if the lag is within seconds, but the slave is hundreds and thousands of transacions behind. GTIDs_behind_master in slave status can also help when the seconds_behind_master is not understood well. Consider an example with: sesconds_behind_master = 2753 GTIDs_behind_master = 1
            Elkin Andrei Elkin added a comment - - edited

            salle, the MDEV-32446 concern is actually addressed, and for a single gtid domain it's

            GTIDs_behind_master = ShowSlaveStatus::Gtid_IO_Pos - @@global.gtid_slave_pos
            

            The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_*) reliably for normal cases of gtid_strict_mode is ON.

            Elkin Andrei Elkin added a comment - - edited salle , the MDEV-32446 concern is actually addressed, and for a single gtid domain it's GTIDs_behind_master = ShowSlaveStatus::Gtid_IO_Pos - @@global.gtid_slave_pos The relation is not generally true, but at least it approximates (still may not be true if the application messes with @@session.gtid_* ) reliably for normal cases of gtid_strict_mode is ON.
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.5 [ 29506 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]

            The problem with GTID's_behind_master is that it is not very useful for delayed slave.

            monty Michael Widenius added a comment - The problem with GTID's_behind_master is that it is not very useful for delayed slave.
            monty Michael Widenius added a comment - - edited

            To not cause issues with old applications and still be able to provide
            a fix for 10.6, my suggestion is to add a few new slave status
            variables to fix the shortcomings the current definition of
            Seconds_behind_master.
            In 11.6 we could then deprecate Seconds_behind_master and encourage
            users to switch to monitor the new variables.

            Master_last_event_time Master timestamp of the last event read from the
            master. Is NULL when no active replication.
            Slave_state_time Master timestamp of event to be executed on the
            slave - 1 second. If no events, then same as
            Master_last_event_time.
            In case of delayed slave, the Master timestamp
            of the next to be executed event - 1 second.
            This can be interpreted as 'the slave is
            a snapshot of the master at this time'.
            Master_slave_time_diff Difference between the above two events. NULL
            if Master_last_event_time is not set.

            The above definitions works both for normal slaves and delayed slaves.

            As an example with SQL_DELAY=5 hours:

            Master starts sending events from a binary log with old data:

            • Slave_state_time is first set to 'first event - 1 second'
            • Slave_state_time will be update before each event is executed and
              the Master_slave_time_diff will eventually reach 5 hours at the same
              time as Master_last_event_time reaches 'current clock time'.
            • It will grow up to '5 hours + execution time of last statement' and
              then go down to 5 hours, if there are other events waiting to
              be executed.

            When master stops/pauses sending events;

            • Slave_state_time will over time go up to be closer to Master_last_event_time
              and Master_slave_time_diff will eventually become 0.
              (As Slave is fully in sync with Master).

            If there are new events coming in after 5 hours,

            • The Slave_state_time will slowly start growing up from 0 to 5 hours +
              execution time for longest delayed item.

            More concrete example.

            Assuming a new master with no binary log events connecting to a new slave:

            • IO thread reads an event that was completed at 2023:01:01 01:01:01
            • Slave reads event
            • Updates Master_last_event_time to 2023:01:01 01:01:01
            • Puts event in Delay queue
            • Updates Slave_state_time to 2023:01:01 01:01:00
            • IO thread reads an event that was completed at 2023:01:01 02:01:01
            • Slave reads event
            • Updates Master_last_event_time to 2023:01:01 02:01:01
            • Puts event in Delay queue
            • Master_slave_time_diff is now 3601
            • Clock reaches 2023:01:01 06:01:01:
            • Master_slave_time_diff is now 60*5+1 = 18001
            • Event from 2023:01:01 01:01:01 is executed.
            • Slave_state_time is now set to 2023:01:01 02:01:00 (as this is
              1 second less than the next event to be executed).
            • Master_slave_time_diff is now 60*4+1 = 14401

            Instead of having 1 second difference, we could have higher precision
            and use 1/1000 of a second difference. The logic is still the same.

            monty Michael Widenius added a comment - - edited To not cause issues with old applications and still be able to provide a fix for 10.6, my suggestion is to add a few new slave status variables to fix the shortcomings the current definition of Seconds_behind_master. In 11.6 we could then deprecate Seconds_behind_master and encourage users to switch to monitor the new variables. Master_last_event_time Master timestamp of the last event read from the master. Is NULL when no active replication. Slave_state_time Master timestamp of event to be executed on the slave - 1 second. If no events, then same as Master_last_event_time. In case of delayed slave, the Master timestamp of the next to be executed event - 1 second. This can be interpreted as 'the slave is a snapshot of the master at this time'. Master_slave_time_diff Difference between the above two events. NULL if Master_last_event_time is not set. The above definitions works both for normal slaves and delayed slaves. As an example with SQL_DELAY=5 hours: Master starts sending events from a binary log with old data: Slave_state_time is first set to 'first event - 1 second' Slave_state_time will be update before each event is executed and the Master_slave_time_diff will eventually reach 5 hours at the same time as Master_last_event_time reaches 'current clock time'. It will grow up to '5 hours + execution time of last statement' and then go down to 5 hours, if there are other events waiting to be executed. When master stops/pauses sending events; Slave_state_time will over time go up to be closer to Master_last_event_time and Master_slave_time_diff will eventually become 0. (As Slave is fully in sync with Master). If there are new events coming in after 5 hours, The Slave_state_time will slowly start growing up from 0 to 5 hours + execution time for longest delayed item. More concrete example. Assuming a new master with no binary log events connecting to a new slave: IO thread reads an event that was completed at 2023:01:01 01:01:01 Slave reads event Updates Master_last_event_time to 2023:01:01 01:01:01 Puts event in Delay queue Updates Slave_state_time to 2023:01:01 01:01:00 IO thread reads an event that was completed at 2023:01:01 02:01:01 Slave reads event Updates Master_last_event_time to 2023:01:01 02:01:01 Puts event in Delay queue Master_slave_time_diff is now 3601 Clock reaches 2023:01:01 06:01:01: Master_slave_time_diff is now 60*5+1 = 18001 Event from 2023:01:01 01:01:01 is executed. Slave_state_time is now set to 2023:01:01 02:01:00 (as this is 1 second less than the next event to be executed). Master_slave_time_diff is now 60*4+1 = 14401 Instead of having 1 second difference, we could have higher precision and use 1/1000 of a second difference. The logic is still the same.
            Elkin Andrei Elkin added a comment -

            monty, I really like the idea which In essence it is about presenting "slave is behind" as the subtract between
            the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp.
            So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2, the queue
            would look like [ g_100, g_99, .... g_2 ], and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc).

            This is all good also 'cos
            A. your method is not interested in the slave local time;
            B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier.

            Fyi, bnestere.

            Elkin Andrei Elkin added a comment - monty , I really like the idea which In essence it is about presenting "slave is behind" as the subtract between the last queued event' timestamp and the first queued (which is also next to execute) one's timestamp. So when the IO thread has received the last event g_100 , and the applier (SQL) thread is about to take on g_2 , the queue would look like [ g_100, g_99, .... g_2 ] , and timestamp(g_100) - timestamp(g_2) is the value of how much behind (I simplify here with omitting -1 or 1/1000 etc). This is all good also 'cos A. your method is not interested in the slave local time; B. it also covers the zero-delayed and it's not difficult to see how it would do well with the parallel applier. Fyi, bnestere .
            monty Michael Widenius made changes -
            Assignee Andrei Elkin [ elkin ] Michael Widenius [ monty ]
            monty Michael Widenius made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bnestere Brandon Nesterenko made changes -
            Status In Progress [ 3 ] In Testing [ 10301 ]
            bnestere Brandon Nesterenko made changes -
            Assignee Michael Widenius [ monty ] Ramesh Sivaraman [ JIRAUSER48189 ]

            Hi ramesh!

            Thanks for stepping in to test this! The latest patch is currently on bb-10.6-monty

            bnestere Brandon Nesterenko added a comment - Hi ramesh ! Thanks for stepping in to test this! The latest patch is currently on bb-10.6-monty
            serg Sergei Golubchik made changes -

            This appears to cause a massive amount of MemorySanitizer failures. The following change would seem to fix it:

            diff --git a/sql/sql_class.h b/sql/sql_class.h
            index af589243fbc..8802752bd26 100644
            --- a/sql/sql_class.h
            +++ b/sql/sql_class.h
            @@ -3168,7 +3168,7 @@ class THD: public THD_count, /* this must be first */
               ulonglong  start_utime, utime_after_lock, utime_after_query;
               /* This can be used by handlers to send signals to the SQL level */
               ulonglong  replication_flags;
            -  ulong orig_exec_time;              // Exec time for last read binlog event
            +  my_time_t orig_exec_time= 0;       // Exec time for last read binlog event
               // Process indicator
               struct {
                 /*
            diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc
            index 747db369d75..2699ec773f8 100644
            --- a/sql/log_event_server.cc
            +++ b/sql/log_event_server.cc
            @@ -1352,7 +1352,7 @@ Query_log_event::Query_log_event(THD* thd_arg, const char* query_arg,
               */
               exec_time= ((thd->slave_thread) ?
                           thd->orig_exec_time :
            -              (ulong) (my_time(0) - thd_arg->start_time));
            +              (my_time(0) - thd_arg->start_time));
             
               /**
                 @todo this means that if we have no catalog, then it is replicated
            index c2f56ea9771..69ba7cde440 100644
            --- a/sql/log_event.h
            +++ b/sql/log_event.h
            @@ -1297,7 +1297,7 @@ class Log_event
               my_time_t when;
               ulong     when_sec_part;
               /* The number of seconds the query took to run on the master. */
            -  ulong exec_time;
            +  my_time_t exec_time;
               /* Number of bytes written by write() function */
               size_t data_written;
             
            

            I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs.

            The second hunk identifies where the garbage was being copied to Query_log_event::exec_time. From there it would propagate to the local variable buf in Query_log_event::write().

            marko Marko Mäkelä added a comment - This appears to cause a massive amount of MemorySanitizer failures. The following change would seem to fix it: diff --git a/sql/sql_class.h b/sql/sql_class.h index af589243fbc..8802752bd26 100644 --- a/sql/sql_class.h +++ b/sql/sql_class.h @@ -3168,7 +3168,7 @@ class THD: public THD_count, /* this must be first */ ulonglong start_utime, utime_after_lock, utime_after_query; /* This can be used by handlers to send signals to the SQL level */ ulonglong replication_flags; - ulong orig_exec_time; // Exec time for last read binlog event + my_time_t orig_exec_time= 0; // Exec time for last read binlog event // Process indicator struct { /* diff --git a/sql/log_event_server.cc b/sql/log_event_server.cc index 747db369d75..2699ec773f8 100644 --- a/sql/log_event_server.cc +++ b/sql/log_event_server.cc @@ -1352,7 +1352,7 @@ Query_log_event::Query_log_event(THD* thd_arg, const char* query_arg, */ exec_time= ((thd->slave_thread) ? thd->orig_exec_time : - (ulong) (my_time(0) - thd_arg->start_time)); + (my_time(0) - thd_arg->start_time)); /** @todo this means that if we have no catalog, then it is replicated index c2f56ea9771..69ba7cde440 100644 --- a/sql/log_event.h +++ b/sql/log_event.h @@ -1297,7 +1297,7 @@ class Log_event my_time_t when; ulong when_sec_part; /* The number of seconds the query took to run on the master. */ - ulong exec_time; + my_time_t exec_time; /* Number of bytes written by write() function */ size_t data_written; I took the liberty of redefining some data types consistently. The ulong type should be avoided, because its size differs between LP64 and LLP64 ABIs. The second hunk identifies where the garbage was being copied to Query_log_event::exec_time . From there it would propagate to the local variable buf in Query_log_event::write() .
            bnestere Brandon Nesterenko made changes -

            bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this)
            Thank you!

            Roel Roel Van de Paar added a comment - bnestere Could you please review Marko's input and push a final patch to the test branch? (I am assisting Ramesh with testing this) Thank you!

            Hi Roel!

            It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp.

            It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview. I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing).

            So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by MDEV-34420).

            bnestere Brandon Nesterenko added a comment - Hi Roel ! It is already done note that Monty created a new branch to house this feature: bb-10.6-master-timestamp . It is a little strange, but right now the patch exists in two places: bb-10.6-master-timestamp (which is to be ported to 10.6 enterprise), and preview-11.6-preview . I'll have to check in with Monty to see if the plan is to create a custom build for the customer that wants this, as I imagine that will affect your testing (I imagine both versions will need testing). So I think you and ramesh can switch to testing on the 11.6 preview branch until we port the patch to enterprise. Another thing to note is that this last weekend, Monty added an information_schema.slave_status table to provide an easier-to-use interface for SHOW SLAVE STATUS (though note it is currently affected by MDEV-34420 ).
            bnestere Brandon Nesterenko made changes -
            Roel Roel Van de Paar added a comment - - edited

            Thank you bnestere! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing.

            For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available?

            In the meantime, I have started testing preview-11.6-preview.

            Roel Roel Van de Paar added a comment - - edited Thank you bnestere ! I see development is ongoing in the branch. Yes, please let us know once the ES port branch is ready for 10.6 testing. For 11.6, it would be helpful to have a branch that has only this feature on top of 11.6 trunk, is it available? In the meantime, I have started testing preview-11.6-preview.

            According to monty, the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix.

            diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc
            index 51ca3b183ec..d279761767a 100644
            --- a/sql/rpl_parallel.cc
            +++ b/sql/rpl_parallel.cc
            @@ -60,7 +60,10 @@ rpt_handle_event(rpl_parallel_thread::queued_event *qev,
               rgi->future_event_relay_log_pos= qev->future_event_relay_log_pos;
               strcpy(rgi->future_event_master_log_name, qev->future_event_master_log_name);
               if (event_can_update_last_master_timestamp(ev))
            +  {
                 rgi->last_master_timestamp= ev->when + ev->exec_time;
            +    thd->orig_exec_time= ev->exec_time;
            +  }
               err= apply_event_and_update_pos_for_parallel(ev, thd, rgi);
             
               rli->executed_entries++;
            

            marko Marko Mäkelä added a comment - According to monty , the initialization of THD::orig_exec_time was intentionally absent. A large number of failures would be fixed by the following patch, but at least the replication of LOAD DATA seems to require a further fix. diff --git a/sql/rpl_parallel.cc b/sql/rpl_parallel.cc index 51ca3b183ec..d279761767a 100644 --- a/sql/rpl_parallel.cc +++ b/sql/rpl_parallel.cc @@ -60,7 +60,10 @@ rpt_handle_event(rpl_parallel_thread::queued_event *qev, rgi->future_event_relay_log_pos= qev->future_event_relay_log_pos; strcpy(rgi->future_event_master_log_name, qev->future_event_master_log_name); if (event_can_update_last_master_timestamp(ev)) + { rgi->last_master_timestamp= ev->when + ev->exec_time; + thd->orig_exec_time= ev->exec_time; + } err= apply_event_and_update_pos_for_parallel(ev, thd, rgi); rli->executed_entries++;

            Which value are we expected to see in the case below?

            • The IO thread was stopped 1 hour ago and it's 8:00
            • The time of the last commit in the relay log is 7:00
            • The time after the last applied event is 6:00
            f_razzoli Federico Razzoli added a comment - Which value are we expected to see in the case below? The IO thread was stopped 1 hour ago and it's 8:00 The time of the last commit in the relay log is 7:00 The time after the last applied event is 6:00

            Hi f_razzoli!

            Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master).

            Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master.
            Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads)
            Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.

            bnestere Brandon Nesterenko added a comment - Hi f_razzoli ! Note that the current time seen on the slave is not considered in these new fields. All that is showed is the time on the event in the binary log (which will be the time was executed on the master). Master_last_event_time: 7:00. // This field will be updated when the IO thread receives an event from the master. Slave_last_event_time: 6:00. // This field will be updated when a transaction commits (by the SQL thread/worker threads) Master_Slave_time_diff: 60 // A field for convenience to show the difference between Master_last_event_time and Slave_last_event_time in seconds.
            ralf.gebhardt Ralf Gebhardt made changes -
            Labels Preview_11.6
            serg Sergei Golubchik made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201691
            Zendesk active tickets 201691
            ramesh Ramesh Sivaraman added a comment - - edited

            ok to push the 11.6 version only (this was tested using preview-11.6-preview).

            For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet.

            bnestere- Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.

            ramesh Ramesh Sivaraman added a comment - - edited ok to push the 11.6 version only (this was tested using preview-11.6-preview). For 10.6, bb-10.6-monty was tested previously, but not bb-10.6-master-timestamp yet. bnestere - Please create a new ticket for the custom build (presumably based on the updated bb-10.6-master-timestamp), once it's ready for testing. The build and branch will need separate testing.
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Brandon Nesterenko [ JIRAUSER48702 ]
            Status In Testing [ 10301 ] Stalled [ 10000 ]
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -

            Newest code, all merged to one commit, can be found at bb-11.6-MDEV-33856-seconds_behind_master

            monty Michael Widenius added a comment - Newest code, all merged to one commit, can be found at bb-11.6- MDEV-33856 -seconds_behind_master

            A summary of the commit:

            The following new variables replaces 'seconds_behind_master':

            • Master_last_event_time Timestamp of the last committed event read from the master by the IO thread.
            • Slave_last_event_time Master timestamp of the last event committed on the slave.
            • Master_Slave_time_diff The difference of the above two timestamps.

            In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS.
            One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.

            monty Michael Widenius added a comment - A summary of the commit: The following new variables replaces 'seconds_behind_master': Master_last_event_time Timestamp of the last committed event read from the master by the IO thread. Slave_last_event_time Master timestamp of the last event committed on the slave. Master_Slave_time_diff The difference of the above two timestamps. In addition SLAVE STATUS is now part of information_schema.slave_status. One can now query this table instead of using SHOW SLAVE STATUS or SHOW ALL SLAVES STATUS. One benefit of information_schema.slave_status is that it is now easy to just extract some of the fields instead of having to always get all fields as one gets with SHOW SLAVE STATUS.
            knielsen Kristian Nielsen added a comment - Review: https://lists.mariadb.org/hyperkitty/list/developers@lists.mariadb.org/thread/LQZOIASN4VSOFQJXVDOOO5NCYOELEXOS/
            bnestere Brandon Nesterenko made changes -
            Summary New definition for Seconds_Behind_Master Alternative Replication Lag Representation via Received/Executed Master Binlog Event Timestamps

            Hi knielsen!

            I took over for Monty in making the edits for your review. Can you review the changes?

            + e1ee2ed0522...071dcfaeca5 bb-11.6-MDEV-33856-bnestere -> bb-11.6-MDEV-33856-bnestere
            

            bnestere Brandon Nesterenko added a comment - Hi knielsen ! I took over for Monty in making the edits for your review. Can you review the changes? + e1ee2ed0522...071dcfaeca5 bb-11.6-MDEV-33856-bnestere -> bb-11.6-MDEV-33856-bnestere
            bnestere Brandon Nesterenko made changes -
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Kristian Nielsen [ knielsen ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I had a quick look, looks ok, thanks.

            - Kristian.

            knielsen Kristian Nielsen added a comment - I had a quick look, looks ok, thanks. - Kristian.
            knielsen Kristian Nielsen made changes -
            Assignee Kristian Nielsen [ knielsen ] Brandon Nesterenko [ JIRAUSER48702 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            Pushed on Monty's behalf into 11.6 as 25b5c63905682a0e38d8ed86170d98be6ddb8cf8

            bnestere Brandon Nesterenko added a comment - Pushed on Monty's behalf into 11.6 as 25b5c63905682a0e38d8ed86170d98be6ddb8cf8
            bnestere Brandon Nesterenko made changes -
            Component/s Information Schema [ 14413 ]
            Component/s Replication [ 10100 ]
            Fix Version/s 11.6.0 [ 29839 ]
            Fix Version/s 11.6 [ 29515 ]
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Michael Widenius [ monty ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            bnestere Brandon Nesterenko made changes -
            Description The current definition of Seconds_Behind_Master is both complex and confusing.
            This can be seen in the documentation of the variable at:
            https://mariadb.com/kb/en/delayed-replication/

            The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

            The suggested changes are:
            * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
            * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

            As an example with SQL_DELAY=5 hours:
            - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
            - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

            In this case the Seconds_Behind_master will be 5 hours.

            The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
            * Without and without delayed replication
            * When the master has been down and is just coming up.
            * If the master has been 'idle' for some time.
            * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.
            The current definition of Seconds_Behind_Master is both complex and confusing.
            This can be seen in the documentation of the variable at:
            https://mariadb.com/kb/en/delayed-replication/

            The current value is also not very useful as the value can have strange values in the case the master has long pauses between entries.

            The suggested changes are:
            * Parallel execution uses the time *AFTER* execution while 'normal' replication used the value *BEFORE* to calculate Seconds_Behind_Master. This should be changed to always use *AFTER*.
            * Change Seconds_Behind_Master to have a more well defined meaning: 'The slave data is snapshot XXX seconds behind the data on the master. This can be calculated by using the formula: *Newest event completion time from master* (when writing to the relay log) - *Completion time for last committed event from the master*. If the slave is 'idle' then the Seconds_Behind_Master should be 0.

            As an example with SQL_DELAY=5 hours:
            - IO thread reads an event that was completed at 2023:01:01 06:01:01 on the master
            - The SQL thread commits later an event that was originally committed at 2023:01:01 01:01:01

            In this case the Seconds_Behind_master will be 5 hours.

            The benefit of this idea is that 'Seconds_Behind_Master' will be well defined in all of the following cases:
            * Without and without delayed replication
            * When the master has been down and is just coming up.
            * If the master has been 'idle' for some time.
            * When the slave is stopped and later restarted (Seconds_Behind_Master will be up to date as soon as IO thread is up to date). We could have make the value 'correct' from start by setting *Newest event completion time from master* as 'master current time' when connecting.

            Note too that the final commit also adds in the table information_schema.slave_status (MDEV-33526), as an alias for SHOW ALL SLAVES STATUS. So these fields can now be queried by SELECT statements.
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            bnestere Brandon Nesterenko made changes -
            Elkin Andrei Elkin made changes -
            bnestere Brandon Nesterenko made changes -
            ParadoxV5 Jimmy Hú made changes -
            susil.behera Susil Behera made changes -
            bnestere Brandon Nesterenko made changes -
            ParadoxV5 Jimmy Hú made changes -

            People

              monty Michael Widenius
              monty Michael Widenius
              Votes:
              3 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.