Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33316

rpl_change_master_demote binlog race condition

Details

    • Bug
    • Status: Stalled (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
    • 10.11
    • Replication, Tests
    • None

    Description

      In between test case 3 and test case 4 of rpl_change_master_demote.test, ER_GTID_POSITION_NOT_FOUND_IN_BINLOG2 is sent to the slave because of a race where the slave requests to start at a GTID not in the master's binlogs, and the master binlogs a transaction. This puts the master gtid_binlog_position ahead of the requested slave_connect_state.

      The resulting failure looks like:

      rpl.rpl_change_master_demote 'mix'       w1 [ fail ]
              Test ended at 2024-01-24 10:39:40
       
      CURRENT_TEST: rpl.rpl_change_master_demote
      mysqltest: In included file "./include/sync_with_master_gtid.inc": 
      included from /home/buildbot/buildbot/build/mariadb-10.11.7/mysql-test/suite/rpl/t/rpl_change_master_demote.test at line 149:
      At line 48: Failed to sync with master
       
      The result from queries just before the failure was:
      < snip >
      #  * True primary is back to connection 'master'
      #  * True replica is back to connection 'slave'
      ##############################################
      connection master;
      connection slave;
      CHANGE MASTER TO master_host='127.0.0.1', master_port=MASTER_PORT, master_user='root', master_use_gtid=slave_pos, master_demote_to_slave=1;
      #
      # Test Case 4: If gtid_slave_pos and gtid_binlog_pos are equivalent,
      # MASTER_DEMOTE_TO_SLAVE=1 will not change gtid_slave_pos.
      #
      connection master;
      # update gtid_binlog_pos and demote it (we have proven this works)
      INSERT INTO t1 VALUES (3);
      # Update to account for statements to verify replication in include file
      CHANGE MASTER TO master_host='127.0.0.1', master_port=SLAVE_PORT, master_user='root', master_use_gtid=slave_pos, master_demote_to_slave=1;
      RESET SLAVE ALL;
      include/save_master_gtid.inc
      connection slave;
      include/sync_with_master_gtid.inc
      Timeout in master_gtid_wait('0-1-10', 120), current slave GTID position is: 0-2-9.
       
      More results from queries before failure can be found in /dev/shm/var/1/log/rpl_change_master_demote.log
       
       - saving '/dev/shm/var/1/log/rpl.rpl_change_master_demote-mix/' to '/dev/shm/var/log/rpl.rpl_change_master_demote-mix/'
       
      Retrying test rpl.rpl_change_master_demote, attempt(2/2)...
      

      Attachments

        Issue Links

          Activity

            This test can also fail with

            rpl.rpl_change_master_demote 'mix'       w64 [ fail ]
                    Test ended at 2024-05-20 09:17:08     
             
            CURRENT_TEST: rpl.rpl_change_master_demote
            mysqltest: At line 292: "IO thread should not be running after START SLAVE UNTIL master_gtid_pos using a pre-existing GTID"
             
            The result from queries just before the failure was:
            < snip >
            SELECT VARIABLE_NAME, GLOBAL_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE VARIABLE_NAME LIKE 'gtid_binlog_pos' OR VARIABLE_NAME LIKE 'gtid_slave_pos' ORDER BY VARIABLE_NAME ASC;
            VARIABLE_NAME GLOBAL_VALUE
            GTID_BINLOG_POS 0-1-26,1-3-4,2-1-3,3-1-2,4-3-2
            GTID_SLAVE_POS  0-2-24,1-3-4,2-1-3,3-1-2,4-3-2
            CHANGE MASTER TO master_host='127.0.0.1', master_port=SLAVE_PORT, master_user='root', master_use_gtid=Slave_Pos, master_demote_to_slave=1;
            SELECT VARIABLE_NAME, GLOBAL_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE VARIABLE_NAME LIKE 'gtid_binlog_pos' OR VARIABLE_NAME LIKE 'gtid_slave_pos' ORDER BY VARIABLE_NAME ASC;
            VARIABLE_NAME GLOBAL_VALUE
            GTID_BINLOG_POS 0-1-26,1-3-4,2-1-3,3-1-2,4-3-2
            GTID_SLAVE_POS  0-1-26,1-3-4,2-1-3,3-1-2,4-3-2
            # GTID ssu_middle_binlog_pos should be considered in the past because
            # gtid_slave_pos should be updated using the latest binlog gtids.
            # The following call to sync_with_master_gtid.inc uses the latest
            # binlog position and should still succeed despite the SSU stop 
            # position pointing to a previous event (because
            # master_demote_to_slave=1 merges gtid_binlog_pos into gtid_slave_pos).
            START SLAVE UNTIL master_gtid_pos="ssu_middle_binlog_pos";
            Warnings:
            Note  1278  It is recommended to use --skip-slave-start when doing step-by-step replication with START SLAVE UNTIL; otherwise, you will get problems if you get an unexpected slave's mariadbd restart
            # Slave needs time to start and stop automatically
            # Validating neither SQL nor IO threads are running..
            

            bnestere Brandon Nesterenko added a comment - This test can also fail with rpl.rpl_change_master_demote 'mix' w64 [ fail ] Test ended at 2024-05-20 09:17:08   CURRENT_TEST: rpl.rpl_change_master_demote mysqltest: At line 292: "IO thread should not be running after START SLAVE UNTIL master_gtid_pos using a pre-existing GTID"   The result from queries just before the failure was: < snip > SELECT VARIABLE_NAME, GLOBAL_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE VARIABLE_NAME LIKE 'gtid_binlog_pos' OR VARIABLE_NAME LIKE 'gtid_slave_pos' ORDER BY VARIABLE_NAME ASC; VARIABLE_NAME GLOBAL_VALUE GTID_BINLOG_POS 0-1-26,1-3-4,2-1-3,3-1-2,4-3-2 GTID_SLAVE_POS 0-2-24,1-3-4,2-1-3,3-1-2,4-3-2 CHANGE MASTER TO master_host='127.0.0.1', master_port=SLAVE_PORT, master_user='root', master_use_gtid=Slave_Pos, master_demote_to_slave=1; SELECT VARIABLE_NAME, GLOBAL_VALUE FROM INFORMATION_SCHEMA.SYSTEM_VARIABLES WHERE VARIABLE_NAME LIKE 'gtid_binlog_pos' OR VARIABLE_NAME LIKE 'gtid_slave_pos' ORDER BY VARIABLE_NAME ASC; VARIABLE_NAME GLOBAL_VALUE GTID_BINLOG_POS 0-1-26,1-3-4,2-1-3,3-1-2,4-3-2 GTID_SLAVE_POS 0-1-26,1-3-4,2-1-3,3-1-2,4-3-2 # GTID ssu_middle_binlog_pos should be considered in the past because # gtid_slave_pos should be updated using the latest binlog gtids. # The following call to sync_with_master_gtid.inc uses the latest # binlog position and should still succeed despite the SSU stop # position pointing to a previous event (because # master_demote_to_slave=1 merges gtid_binlog_pos into gtid_slave_pos). START SLAVE UNTIL master_gtid_pos="ssu_middle_binlog_pos"; Warnings: Note 1278 It is recommended to use --skip-slave-start when doing step-by-step replication with START SLAVE UNTIL; otherwise, you will get problems if you get an unexpected slave's mariadbd restart # Slave needs time to start and stop automatically # Validating neither SQL nor IO threads are running..

            People

              bnestere Brandon Nesterenko
              bnestere Brandon Nesterenko
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.