Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29639

Seconds_Behind_Master is incorrect for Delayed, Parallel Replicas

    XMLWordPrintable

Details

    Description

      Delayed Replicas, i.e. those using the MASTER_DELAY option of CHANGE MASTER TO, also configured to use parallel threads calculate Seconds_Behind_Master incorrectly. This commit changed parallel replicas to update Seconds_Behind_Master at the time of transaction commit. However, on a delayed replica, an event's Seconds_Behind_Master will not be calculated until after MASTER_DELAY seconds have passed and the event has finished executing. In other words, when a new event is received, the value of Seconds_Behind_Master will be calculated using the time of the last committed event, resulting in potentially very large values of Seconds_Behind_Master for the entire duration of MASTER_DELAY. This is especially prevalent for workloads with infrequent transactions.

      The following MTR test highlights this issue:

      --source include/master-slave.inc
      --source include/have_binlog_format_row.inc
       
      --echo #
      --echo # Initialize test data
      --connection master
      create table t1 (a int);
      insert into t1 values (1);
      --source include/save_master_gtid.inc
       
      --connection slave
      --source include/sync_with_master_gtid.inc
      --source include/stop_slave.inc
      CHANGE MASTER TO MASTER_DELAY=4, MASTER_USE_GTID=Slave_Pos;
      set @@global.slave_parallel_threads= 4;
      --source include/start_slave.inc
       
      --echo # Set up a long interval between now and the next event to boost SBM
      --connection master
      --sleep 10
       
      --let $ctr=8
      while($ctr)
      {
          --connection slave
       
          # On the first iteration, SBM will be 0 because there are no new events
          --let $status_items= Seconds_Behind_Master
          --source include/show_slave_status.inc
       
          --connection master
          --eval insert into t1 values ($ctr)
          --send select sleep(1)
          --dec $ctr
       
          # On the first iteration, SBM will boost to 10 because of the long
          # interval, despite only just receiving the event
          --connection slave
          --source include/show_slave_status.inc
       
          --connection master
          --reap
      }
       
       
       
      --echo #
      --echo # Cleanup
      --connection master
      DROP TABLE t1;
      --source include/save_master_gtid.inc
       
      --connection slave
      --source include/sync_with_master_gtid.inc
      --source include/stop_slave.inc
      CHANGE MASTER TO MASTER_DELAY=0;
      set @@global.slave_parallel_threads= 0;
      --source include/start_slave.inc
       
      --source include/rpl_end.inc
       
      --echo # End of tests
      

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.