Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23586

Mariabackup: GTID saved for replication in 10.4.14 is wrong

Details

    Description

      It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.

      To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

      This wrong GTID is saved to the xtrabackup_slave_info and since the position is no longer saved there (WHY???) the resulting backup does not carry a valid PIT information anymore.
      This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.

      Thanks to aponzo who found and verified this issue.

      Workaround: downgrade server to 10.4.13

      [00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
      [00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
      [00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
      [00] 2020-08-26 12:25:52 Streaming backup-my.cnf
      [00] 2020-08-26 12:25:52         ...done
      [00] 2020-08-26 12:25:52 Streaming xtrabackup_info
      [00] 2020-08-26 12:25:52         ...done
      [00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
      [00] 2020-08-26 12:25:52 completed OK!
      prepare
      2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
      2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
      2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
      [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
      [00] 2020-08-26 12:26:54 completed OK!
      MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
      +--------------------------------------------+
      | BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
      +--------------------------------------------+
      | 0-1-18699                                  |
      +--------------------------------------------+
      1 row in set (0.133 sec)
      

      Attachments

        Issue Links

          Activity

            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            From further testing:

            • problem hits with slave_parallel_threads = 0 too
            • it appears the drift between incorrect and correct GTID == number of parallel threads (with no parallel threads there is only one thread so it is == 1)
            rpizzi Rick Pizzi (Inactive) added a comment - - edited From further testing: problem hits with slave_parallel_threads = 0 too it appears the drift between incorrect and correct GTID == number of parallel threads (with no parallel threads there is only one thread so it is == 1)

            In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

            Are you using the position printed by mariabackup --prepare to validate the correctness of the position in xtrabackup_binlog_info? i.e. this position?:

            2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
            [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
            

            If so, then I want to point out that the position printed by mariabackup --prepare is not guaranteed to be consistent, so it is not a good idea to use this position to validate the position in xtrabackup_binlog_info. In fact, the position printed by mariabackup --prepare should never be used at all. For more information about that, see MDEV-21611.

            Only the positions in xtrabackup_binlog_info and xtrabackup_slave_info are currently guaranteed to be consistent by Mariabackup.

            In other words, it is totally normal and expected if the position saved in xtrabackup_binlog_info is different from the position printed by Mariabackup, because only the former position is supposed to be consistent.

            Is your position in xtrabackup_binlog_info actually incorrect? I don't know, so I can't comment on that. I just want to point out that it is unsafe to use the position printed by mariabackup --prepare to make any judgement about the correctness of the position in xtrabackup_binlog_info.

            GeoffMontee Geoff Montee (Inactive) added a comment - In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function. Are you using the position printed by mariabackup --prepare to validate the correctness of the position in xtrabackup_binlog_info ? i.e. this position?: 2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029 [00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029 If so, then I want to point out that the position printed by mariabackup --prepare is not guaranteed to be consistent, so it is not a good idea to use this position to validate the position in xtrabackup_binlog_info . In fact, the position printed by mariabackup --prepare should never be used at all. For more information about that, see MDEV-21611 . Only the positions in xtrabackup_binlog_info and xtrabackup_slave_info are currently guaranteed to be consistent by Mariabackup. In other words, it is totally normal and expected if the position saved in xtrabackup_binlog_info is different from the position printed by Mariabackup, because only the former position is supposed to be consistent. Is your position in xtrabackup_binlog_info actually incorrect? I don't know, so I can't comment on that. I just want to point out that it is unsafe to use the position printed by mariabackup --prepare to make any judgement about the correctness of the position in xtrabackup_binlog_info .

            The printed position and the position saved in the file are identical.

            rpizzi Rick Pizzi (Inactive) added a comment - The printed position and the position saved in the file are identical.

            What is PIT? Point in time? (Binlog position would maybe a better term in that case)

            From what I understand from above, the printed binlog file and positions are correct.
            Only the GTID printed by mariabackup is wrong. Is this correct?

            monty Michael Widenius added a comment - What is PIT? Point in time? (Binlog position would maybe a better term in that case) From what I understand from above, the printed binlog file and positions are correct. Only the GTID printed by mariabackup is wrong. Is this correct?
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            Yes sorry I meant the the binlog position - which is correct, only GTID is wrong.

            The issue is, in the file xtrabackup_slave_info only the GTID is saved. I think it woud be best to go back to also printing the position...

            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - - edited Yes sorry I meant the the binlog position - which is correct, only GTID is wrong. The issue is, in the file xtrabackup_slave_info only the GTID is saved. I think it woud be best to go back to also printing the position... Rick
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            julien.fritsch A test case is more than trivial to build for this issue... just set up one master, one replica, have GTID enabled and parallel slave threads > 1, then take a mariabackup on the replica and verify that the saved GTID in the slave_info file does not correspond to the GTID of the saved position as reported by BINLOG_GTID_POS().
            You need some traffic flowing on master for this of course, we used sysbench.
            As an additional proof - if sysbench workload is inserting rows, using the recommended GTID to set up replication will result in duplicate keys errors.

            rpizzi Rick Pizzi (Inactive) added a comment - - edited julien.fritsch A test case is more than trivial to build for this issue... just set up one master, one replica, have GTID enabled and parallel slave threads > 1, then take a mariabackup on the replica and verify that the saved GTID in the slave_info file does not correspond to the GTID of the saved position as reported by BINLOG_GTID_POS(). You need some traffic flowing on master for this of course, we used sysbench. As an additional proof - if sysbench workload is inserting rows, using the recommended GTID to set up replication will result in duplicate keys errors.

            Binlog position is gotten with SHOW MASTER STATUS in mariabackup. GTID of the last change is gtid_current_pos variable which is gotten with SHOW VARIABLES statement. (see write_binlog_info()). Both statements are executed under BACKUP STAGE BLOCK_COMMIT lock (if --no-lock option is not used).

            I failed to repeat it with mtr test. I have started replication with several slave threads and set watchpoints on variables which are used to count binlog position(mysql_bin_log.log_file.pos_in_file, mysql_bin_log.log_file.write_pos, mysql_bin_log.log_file.request_pos, see also MYSQL_BIN_LOG::raw_get_current_log() - the function which is used to count binlog position on SHOW MASTER STATUS execution). I also set breakpoints to function which is used to update gtid_current_pos(rpl_binlog_state::update_nolock()). Then I set BLOCK_COMMIT lock on slave and executed a bunch of insert's on master. There were not hits after BLOCK_COMMIT on slave.

            I am trying to repeat the bug with sysbench.

            vlad.lesin Vladislav Lesin added a comment - Binlog position is gotten with SHOW MASTER STATUS in mariabackup. GTID of the last change is gtid_current_pos variable which is gotten with SHOW VARIABLES statement. (see write_binlog_info()). Both statements are executed under BACKUP STAGE BLOCK_COMMIT lock (if --no-lock option is not used). I failed to repeat it with mtr test. I have started replication with several slave threads and set watchpoints on variables which are used to count binlog position(mysql_bin_log.log_file.pos_in_file, mysql_bin_log.log_file.write_pos, mysql_bin_log.log_file.request_pos, see also MYSQL_BIN_LOG::raw_get_current_log() - the function which is used to count binlog position on SHOW MASTER STATUS execution). I also set breakpoints to function which is used to update gtid_current_pos(rpl_binlog_state::update_nolock()). Then I set BLOCK_COMMIT lock on slave and executed a bunch of insert's on master. There were not hits after BLOCK_COMMIT on slave. I am trying to repeat the bug with sysbench.

            vlad.lesin let's explain this issue better:
            we have a Server A ( MariaDB version is not important)
            a server B ( Slave of A, mariadb version 10.4.14)
            we need to create a server C ( version 10.4.14) as Slave of A performing a mariabackup on B.
            during the backup the traffic is open on Master A
            At the end on the backup we need to use the file xtrabackup_slave_info to get the CHANGE MASTER in order to connect C as slave of A.

            and we have

            [root@m2 andre2]# cat xtrabackup_slave_info
            SET GLOBAL gtid_slave_pos = '0-1-18691';
            CHANGE MASTER TO master_use_gtid = slave_pos

            of course we will add the MASTER_HOST of A

            doing this we have duplicate errors because the above gtid_slave_pos is wrong.

            if we open the file
            [root@m2 andre2]# cat xtrabackup_binlog_info
            m2-bin.000007 25035029 0-1-18691

            binlog file and pos of server B
            and calculate its GTID for this position

            MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
            --------------------------------------------

            BINLOG_GTID_POS("m2-bin.000007",25035029 )

            --------------------------------------------

            0-1-18699

            --------------------------------------------

            it is different from that reported on xtrabackup_slave_info

            So, the MASTER_LOG_FILE and MASTER_LOG_POS are correct if we want to setup the server C as slave of B, but the GTID is not.

            the Sequence number difference is due to the number of parallel_workers used
            the difference is 1 for parallel workers 0 or 1
            and n for slave_parallel = n used in server B.

            Can you please report, on your test, the output of :
            cat xtrabackup_binlog_info
            SELECT BINLOG_GTID_POS(file,pos);

            thanks
            Andrea

            aponzo Andrea Ponzo (Inactive) added a comment - vlad.lesin let's explain this issue better: we have a Server A ( MariaDB version is not important) a server B ( Slave of A, mariadb version 10.4.14) we need to create a server C ( version 10.4.14) as Slave of A performing a mariabackup on B. during the backup the traffic is open on Master A At the end on the backup we need to use the file xtrabackup_slave_info to get the CHANGE MASTER in order to connect C as slave of A. and we have [root@m2 andre2] # cat xtrabackup_slave_info SET GLOBAL gtid_slave_pos = '0-1-18691'; CHANGE MASTER TO master_use_gtid = slave_pos of course we will add the MASTER_HOST of A doing this we have duplicate errors because the above gtid_slave_pos is wrong. if we open the file [root@m2 andre2] # cat xtrabackup_binlog_info m2-bin.000007 25035029 0-1-18691 binlog file and pos of server B and calculate its GTID for this position MariaDB [test] > SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 ); -------------------------------------------- BINLOG_GTID_POS("m2-bin.000007",25035029 ) -------------------------------------------- 0-1-18699 -------------------------------------------- it is different from that reported on xtrabackup_slave_info So, the MASTER_LOG_FILE and MASTER_LOG_POS are correct if we want to setup the server C as slave of B, but the GTID is not. the Sequence number difference is due to the number of parallel_workers used the difference is 1 for parallel workers 0 or 1 and n for slave_parallel = n used in server B. Can you please report, on your test, the output of : cat xtrabackup_binlog_info SELECT BINLOG_GTID_POS(file,pos); thanks Andrea

            My current assumption, which I am trying to repeat in MTR test, is the following:

            Consider the following code:

            int ha_commit_trans(THD *thd, bool all)                                         
            {
            ...
              if (!is_real_trans)                                                           
              {                                                                             
                error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans);        
                goto done;                                                                  
              }
            ...
              cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered,            
                                            need_commit_ordered);
            ...
              error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans) ? 2 : 0;  
            ... 
            }
            

            tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired.

            So, there can be the following scenario:
            T2 - slave worker, have started transaction and executed some Write_rows events;
            T1 - connection from mariabackup, holds backup block commit lock;
            T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1;
            T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.

            vlad.lesin Vladislav Lesin added a comment - My current assumption, which I am trying to repeat in MTR test, is the following: Consider the following code: int ha_commit_trans(THD *thd, bool all) { ... if (!is_real_trans) { error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans); goto done; } ... cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered, need_commit_ordered); ... error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans) ? 2 : 0 ; ... } tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired. So, there can be the following scenario: T2 - slave worker, have started transaction and executed some Write_rows events; T1 - connection from mariabackup, holds backup block commit lock; T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1; T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.

            monty, I have just confirmed the above scenario with test case: https://github.com/MariaDB/server/commit/cbcb921bb83be8b84142dfa73583724c8af843a8.

            vlad.lesin Vladislav Lesin added a comment - monty , I have just confirmed the above scenario with test case: https://github.com/MariaDB/server/commit/cbcb921bb83be8b84142dfa73583724c8af843a8 .
            vlad.lesin Vladislav Lesin added a comment - - edited

            I have just added one more test case to reproduce the scenario without replication: https://github.com/MariaDB/server/commit/f19ff5b68e70214c4d75a2165ce8f5ef8cd2739f. The problem is the same - binlog flushing in ha_commit_trans() is not covered by backup lock.

            vlad.lesin Vladislav Lesin added a comment - - edited I have just added one more test case to reproduce the scenario without replication: https://github.com/MariaDB/server/commit/f19ff5b68e70214c4d75a2165ce8f5ef8cd2739f . The problem is the same - binlog flushing in ha_commit_trans() is not covered by backup lock.

            Code to review is in bb-10.4-monty

            monty Michael Widenius added a comment - Code to review is in bb-10.4-monty

            Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
            before log_and_order.

            The original problem for MDEV-21953 was that while a thread was waiting in
            for another threads to commit in 'log_and_order', it had the
            MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
            MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
            This causes a deadlock as the waited-for thread can never get past the
            MDL_BACKUP_COMMIT lock in ha_commit_trans.

            The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
            a thread is waiting for other 'previous' threads to commit. This ensures
            that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
            ensures that there are no deadlocks any more.

            monty Michael Widenius added a comment - Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking before log_and_order. The original problem for MDEV-21953 was that while a thread was waiting in for another threads to commit in 'log_and_order', it had the MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks. This causes a deadlock as the waited-for thread can never get past the MDL_BACKUP_COMMIT lock in ha_commit_trans. The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while a thread is waiting for other 'previous' threads to commit. This ensures that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which ensures that there are no deadlocks any more.

            People

              monty Michael Widenius
              rpizzi Rick Pizzi (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.