Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34354

Shows replication time difference in master_last_event_time after setting MASTER_DELAY > 0 in chain replication

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 11.6.0
    • 11.6.0
    • Replication
    • None

    Description

      Found a small difference in Master_last_event_time in replica2 when setting up chain replication. The issue found only when setting MASTER_DELAY > 0 on replica1

      Test case

      MariaDB [test]> create or replace table t1 select seq from seq_1_to_1000;select now();
      Query OK, 1000 rows affected (0.086 sec)
      Records: 1000  Duplicates: 0  Warnings: 0
       
      +---------------------+
      | now()               |
      +---------------------+
      | 2024-06-07 10:16:25 |
      +---------------------+
      1 row in set (0.000 sec)
       
      MariaDB [test]> 
      

      replica status from server replica1. Master delay configured to 5 sec on replica1 SQL_Delay: 5

                       Parallel_Mode: aggressive
                           SQL_Delay: 5
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                    Slave_DDL_Groups: 8
      Slave_Non_Transactional_Groups: 2
          Slave_Transactional_Groups: 1
                Retried_transactions: 0
                  Max_relay_log_size: 1073741824
                Executed_log_entries: 1327
           Slave_received_heartbeats: 67
              Slave_heartbeat_period: 30.000
                      Gtid_Slave_Pos: 11-100-11
              Master_last_event_time: 2024-06-07 10:16:25
               Slave_last_event_time: 2024-06-07 10:16:25
              Master_Slave_time_diff: 0
      1 row in set (0.000 sec)
       
      MariaDB [test]> 
      

      replica status from server replica2, please check Master_last_event_time difference

                           SQL_Delay: 0
                 SQL_Remaining_Delay: NULL
             Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
                    Slave_DDL_Groups: 1
      Slave_Non_Transactional_Groups: 9
          Slave_Transactional_Groups: 1
                Retried_transactions: 0
                  Max_relay_log_size: 1073741824
                Executed_log_entries: 1308
           Slave_received_heartbeats: 91
              Slave_heartbeat_period: 30.000
                      Gtid_Slave_Pos: 11-100-11
              Master_last_event_time: 2024-06-07 10:16:30
               Slave_last_event_time: 2024-06-07 10:16:25
              Master_Slave_time_diff: 5
      1 row in set (0.000 sec)
       
      MariaDB [test]> 
      

      Attachments

        Issue Links

          Activity

            ramesh Ramesh Sivaraman created issue -

            The issue is also present in `Seconds_behind_master`. Thank you bnestere for sharing the MTR test case to test SBM issue

            ramesh Ramesh Sivaraman added a comment - The issue is also present in `Seconds_behind_master`. Thank you bnestere for sharing the MTR test case to test SBM issue

            The issue is that the slave overwrites the binlog event's exec_time, which includes the delay time. This makes Seconds_behind_master and master_last_event_time incorrect on slaves that follow a chain. See the following MTR test case highlighting the issue with SBM

            --source include/have_binlog_format_statement.inc
             
            --let $rpl_topology= 1->2->3
            --source include/rpl_init.inc
             
            --connection server_1
            --source include/have_innodb.inc
            --connection server_2
            --source include/have_innodb.inc
            --connection server_3
            --source include/have_innodb.inc
            --connection server_1
             
            --connection server_1
            create table t1 (a int) engine=aria;
            --source include/save_master_gtid.inc
            --connection server_3
            --source include/sync_with_master_gtid.inc
             
            --connection server_2
            --source include/stop_slave.inc
            change master to master_delay=5;
            --source include/start_slave.inc
             
            --connection server_3
            --source include/stop_slave.inc
            change master to master_delay=10;
            --source include/start_slave.inc
            lock tables t1 write;
             
            --connection server_1
            insert into t1 values (100);
            --source include/save_master_gtid.inc
             
            --connection server_2
            --source include/sync_with_master_gtid.inc
            --echo # Server_2 has new event
             
            --connection server_3
            --echo #
            --echo # Sleep 3s
            --sleep 3
            --let $sbm= query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1)
            --echo # Server_3 is waiting for delay, has correct SBM: $sbm
             
            --echo #
            --echo # Now wait ~2s for server_3's master_delay to end
            --let $wait_condition= SELECT count(*) FROM information_schema.processlist WHERE state LIKE 'Waiting for table metadata lock';
            --source include/wait_condition.inc
            --let $sbm= query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1)
            --echo # Once server_3 delay ends (now is waiting on lock), its SBM jumps by
            --echo # server_2's MASTER_DELAY-(~2 seconds): $sbm
            unlock tables;
            --source include/sync_with_master_gtid.inc
             
            --connection server_2
            --source include/stop_slave.inc
            change master to master_delay=0;
            --source include/start_slave.inc
            --connection server_3
            --source include/stop_slave.inc
            change master to master_delay=0;
            --source include/start_slave.inc
             
            --connection server_1
            drop table t1;
             
            --source include/rpl_end.inc
            

            With .cnf file

            !include include/default_mysqld.cnf
             
            [mysqld.1]
            log-slave-updates
            gtid_domain_id=0
             
            [mysqld.2]
            log-slave-updates
             
            [mysqld.3]
            log-slave-updates
             
            [ENV]
            SERVER_MYPORT_1= @mysqld.1.port
            SERVER_MYPORT_2= @mysqld.2.port
            SERVER_MYPORT_3= @mysqld.3.port
            

            As the transaction is still executing, it would be expected that after 2s, the SBM would go from 8 -> 10; but instead, it jumps down to 5.

            bnestere Brandon Nesterenko added a comment - The issue is that the slave overwrites the binlog event's exec_time, which includes the delay time. This makes Seconds_behind_master and master_last_event_time incorrect on slaves that follow a chain. See the following MTR test case highlighting the issue with SBM --source include/have_binlog_format_statement.inc   --let $rpl_topology= 1->2->3 --source include/rpl_init.inc   --connection server_1 --source include/have_innodb.inc --connection server_2 --source include/have_innodb.inc --connection server_3 --source include/have_innodb.inc --connection server_1   --connection server_1 create table t1 (a int) engine=aria; --source include/save_master_gtid.inc --connection server_3 --source include/sync_with_master_gtid.inc   --connection server_2 --source include/stop_slave.inc change master to master_delay=5; --source include/start_slave.inc   --connection server_3 --source include/stop_slave.inc change master to master_delay=10; --source include/start_slave.inc lock tables t1 write;   --connection server_1 insert into t1 values (100); --source include/save_master_gtid.inc   --connection server_2 --source include/sync_with_master_gtid.inc --echo # Server_2 has new event   --connection server_3 --echo # --echo # Sleep 3s --sleep 3 --let $sbm= query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1) --echo # Server_3 is waiting for delay, has correct SBM: $sbm   --echo # --echo # Now wait ~2s for server_3's master_delay to end --let $wait_condition= SELECT count(*) FROM information_schema.processlist WHERE state LIKE 'Waiting for table metadata lock'; --source include/wait_condition.inc --let $sbm= query_get_value(SHOW SLAVE STATUS, Seconds_Behind_Master, 1) --echo # Once server_3 delay ends (now is waiting on lock), its SBM jumps by --echo # server_2's MASTER_DELAY-(~2 seconds): $sbm unlock tables; --source include/sync_with_master_gtid.inc   --connection server_2 --source include/stop_slave.inc change master to master_delay=0; --source include/start_slave.inc --connection server_3 --source include/stop_slave.inc change master to master_delay=0; --source include/start_slave.inc   --connection server_1 drop table t1;   --source include/rpl_end.inc With .cnf file !include include/default_mysqld.cnf   [mysqld.1] log-slave-updates gtid_domain_id=0   [mysqld.2] log-slave-updates   [mysqld.3] log-slave-updates   [ENV] SERVER_MYPORT_1= @mysqld.1.port SERVER_MYPORT_2= @mysqld.2.port SERVER_MYPORT_3= @mysqld.3.port As the transaction is still executing, it would be expected that after 2s, the SBM would go from 8 -> 10; but instead, it jumps down to 5.
            bnestere Brandon Nesterenko made changes -
            Field Original Value New Value
            Assignee Brandon Nesterenko [ JIRAUSER48702 ] Michael Widenius [ monty ]
            bnestere Brandon Nesterenko made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            bnestere Brandon Nesterenko made changes -
            bnestere Brandon Nesterenko made changes -
            Fix Version/s 11.6.0 [ 29839 ]
            Fix Version/s 10.6 [ 24028 ]
            Affects Version/s 11.6.0 [ 29839 ]
            Affects Version/s 10.6 [ 24028 ]

            This issue was fixed in the official commit for MDEV-33856 in 11.6.0.

            bnestere Brandon Nesterenko added a comment - This issue was fixed in the official commit for MDEV-33856 in 11.6.0.
            bnestere Brandon Nesterenko made changes -
            issue.field.resolutiondate 2024-08-27 21:19:07.0 2024-08-27 21:19:07.052
            bnestere Brandon Nesterenko made changes -
            Resolution Fixed [ 1 ]
            Status Confirmed [ 10101 ] Closed [ 6 ]

            People

              monty Michael Widenius
              ramesh Ramesh Sivaraman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.