Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31749

New test rpl.rpl_parallel_sbm in bb-10.4-MDEV-30619 sporadically fails in various locations (prepatch: lines 100, 177, 184) (postpatch_1: lines 180, 187)

    XMLWordPrintable

Details

    Description

      mysql-test$ ./mtr --repeat 1000 rpl.rpl_parallel_sbm
      

      Leads to:

      bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Debug)

      ...
      rpl.rpl_parallel_sbm 'mix'               [ 18 pass ]  18674
      rpl.rpl_parallel_sbm 'mix'               [ 19 pass ]  18698
      rpl.rpl_parallel_sbm 'mix'               [ 20 fail ]
              Test ended at 2023-07-20 10:13:11
       
      CURRENT_TEST: rpl.rpl_parallel_sbm
      mysqltest: At line 100: Seconds_Behind_Master should not have used second transaction timestamp
       
      The result from queries just before the failure was:
      < snip >
      # Waiting for transaction to arrive on slave and begin SQL Delay..
      # Validating SBM is updated on event arrival..
      # ..done
      connection slave;
      UNLOCK TABLES;
      include/sync_with_master_gtid.inc
      #
      # Pt 2) If the SQL thread has not entered an idle state, ensure
      # following events do not update SBM
      connection master;
      # Sleep 2 to allow a buffer between events for SBM check
      insert into t1 values (1);
      # Sleep 3 to create gap between events
      insert into t1 values (2);
      include/save_master_pos.inc
      connection slave;
      LOCK TABLES t1 WRITE;
      # Wait for first transaction to complete SQL delay and begin execution..
      # Validate SBM calculation doesn't use the second transaction because SQL thread shouldn't have gone idle..
      # SBM 3 was more recent than time since last transaction (4 seconds)
      

      Or, in another instance:

      bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Optimized)

      ...
      rpl.rpl_parallel_sbm 'mix'               [ 19 pass ]  18622
      rpl.rpl_parallel_sbm 'mix'               [ 20 pass ]  19592
      rpl.rpl_parallel_sbm 'mix'               [ 21 fail ]
              Test ended at 2023-07-20 12:47:36
       
      CURRENT_TEST: rpl.rpl_parallel_sbm
      mysqltest: At line 177: Seconds_Behind_Master is too recent
       
      The result from queries just before the failure was:
      < snip >
      connection master;
      include/sync_slave_sql_with_master.inc
      # Lock t1 on slave so the first received transaction does not complete/commit
      connection slave;
      LOCK TABLES t1 WRITE;
      connection master;
      INSERT INTO t1 VALUES (1);
      include/save_master_gtid.inc
      connection slave;
      # Waiting for first transaction to begin..
      # Wait for SQL thread to catch up and idle
      # Sleep 2s to create a gap between events
      LOCK TABLES mysql.gtid_slave_pos WRITE;
      connection master;
      INSERT INTO t2 VALUES (1);
      include/save_master_gtid.inc
      connection slave;
      # Waiting for second transaction to begin..
      # Validate that the second transaction doesn't update Seconds_Behind_Master on reception after the SQL thread has gone idle..
      # Seconds_Behind_Master was 2 yet should not have been less than 3 (1689821254 - 1689821251)
      

      And:

      bb-10.4-MDEV-30619 10.4.31 07745e9e6a00ff6089c68f48d453dfa368f064f4 (Optimized)

      ...
      rpl.rpl_parallel_sbm 'mix'               [ 85 pass ]  18592
      rpl.rpl_parallel_sbm 'mix'               [ 86 pass ]  19673
      rpl.rpl_parallel_sbm 'mix'               [ 87 fail ]
              Test ended at 2023-07-20 13:33:27
       
      CURRENT_TEST: rpl.rpl_parallel_sbm
      mysqltest: At line 184: Seconds_Behind_Master is too old
       
      The result from queries just before the failure was:
      < snip >
      include/sync_slave_sql_with_master.inc
      # Lock t1 on slave so the first received transaction does not complete/commit
      connection slave;
      LOCK TABLES t1 WRITE;
      connection master;
      INSERT INTO t1 VALUES (1);
      include/save_master_gtid.inc
      connection slave;
      # Waiting for first transaction to begin..
      # Wait for SQL thread to catch up and idle
      # Sleep 2s to create a gap between events
      LOCK TABLES mysql.gtid_slave_pos WRITE;
      connection master;
      INSERT INTO t2 VALUES (1);
      include/save_master_gtid.inc
      connection slave;
      # Waiting for second transaction to begin..
      # Validate that the second transaction doesn't update Seconds_Behind_Master on reception after the SQL thread has gone idle..
      # ..and that its value correctly corresponds to the first transaction's timestamp on the primary
      # Seconds_Behind_Master was 3 yet should not have been more than 2 (1689824005 - 1689824003)
      

      It usually takes between 20 to 50 attempts to reproduce, though it can take 100+. System load may affect reproducibility, more load is presumably better.
      Observed on both debug and optimized builds as shown above.

      This particular test is not available prior to the MDEV-30619 fixes, and the previous testcase (executed against a pre-MDEV-30619 debug (this test requires debug) build of 10.4.31 at ea4b8d4ce99d0f427dd7ed40b5eeb15e41eec01e) :

      ./mtr --repeat 100 rpl.rpl_delayed_parallel_slave_sbm
      

      Does not produce any issue (400+ attempts).
      May affect later MariaDB versions also.

      Attachments

        Issue Links

          Activity

            People

              Elkin Andrei Elkin
              Roel Roel Van de Paar
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.