[MDEV-23586] Mariabackup: GTID saved for replication in 10.4.14 is wrong Created: 2020-08-26  Updated: 2021-08-27  Resolved: 2020-09-25

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Server
Affects Version/s: 10.4.14
Fix Version/s: 10.4.16, 10.5.7

Type: Bug Priority: Blocker
Reporter: Rick Pizzi Assignee: Michael Widenius
Resolution: Fixed Votes: 4
Labels: None

Issue Links:
Problem/Incident
causes MDEV-25334 FTWRL/Backup blocks DDL on temporary ... Closed
Relates
relates to MDEV-21611 mariabackup should update binlog posi... Open
relates to MDEV-21953 deadlock between BACKUP STAGE BLOCK_C... Closed

 Description   

It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.

To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

This wrong GTID is saved to the xtrabackup_slave_info and since the position is no longer saved there (WHY???) the resulting backup does not carry a valid PIT information anymore.
This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.

Thanks to aponzo who found and verified this issue.

Workaround: downgrade server to 10.4.13

[00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
[00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
[00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
[00] 2020-08-26 12:25:52 Streaming backup-my.cnf
[00] 2020-08-26 12:25:52         ...done
[00] 2020-08-26 12:25:52 Streaming xtrabackup_info
[00] 2020-08-26 12:25:52         ...done
[00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
[00] 2020-08-26 12:25:52 completed OK!
prepare
2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
[00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
[00] 2020-08-26 12:26:54 completed OK!
MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
+--------------------------------------------+
| BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
+--------------------------------------------+
| 0-1-18699                                  |
+--------------------------------------------+
1 row in set (0.133 sec)



 Comments   
Comment by Rick Pizzi [ 2020-08-26 ]

From further testing:

  • problem hits with slave_parallel_threads = 0 too
  • it appears the drift between incorrect and correct GTID == number of parallel threads (with no parallel threads there is only one thread so it is == 1)
Comment by Geoff Montee (Inactive) [ 2020-08-26 ]

In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.

Are you using the position printed by mariabackup --prepare to validate the correctness of the position in xtrabackup_binlog_info? i.e. this position?:

2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
[00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029

If so, then I want to point out that the position printed by mariabackup --prepare is not guaranteed to be consistent, so it is not a good idea to use this position to validate the position in xtrabackup_binlog_info. In fact, the position printed by mariabackup --prepare should never be used at all. For more information about that, see MDEV-21611.

Only the positions in xtrabackup_binlog_info and xtrabackup_slave_info are currently guaranteed to be consistent by Mariabackup.

In other words, it is totally normal and expected if the position saved in xtrabackup_binlog_info is different from the position printed by Mariabackup, because only the former position is supposed to be consistent.

Is your position in xtrabackup_binlog_info actually incorrect? I don't know, so I can't comment on that. I just want to point out that it is unsafe to use the position printed by mariabackup --prepare to make any judgement about the correctness of the position in xtrabackup_binlog_info.

Comment by Rick Pizzi [ 2020-08-27 ]

The printed position and the position saved in the file are identical.

Comment by Michael Widenius [ 2020-08-27 ]

What is PIT? Point in time? (Binlog position would maybe a better term in that case)

From what I understand from above, the printed binlog file and positions are correct.
Only the GTID printed by mariabackup is wrong. Is this correct?

Comment by Rick Pizzi [ 2020-08-27 ]

Yes sorry I meant the the binlog position - which is correct, only GTID is wrong.

The issue is, in the file xtrabackup_slave_info only the GTID is saved. I think it woud be best to go back to also printing the position...

Rick

Comment by Rick Pizzi [ 2020-08-27 ]

julien.fritsch A test case is more than trivial to build for this issue... just set up one master, one replica, have GTID enabled and parallel slave threads > 1, then take a mariabackup on the replica and verify that the saved GTID in the slave_info file does not correspond to the GTID of the saved position as reported by BINLOG_GTID_POS().
You need some traffic flowing on master for this of course, we used sysbench.
As an additional proof - if sysbench workload is inserting rows, using the recommended GTID to set up replication will result in duplicate keys errors.

Comment by Vladislav Lesin [ 2020-09-01 ]

Binlog position is gotten with SHOW MASTER STATUS in mariabackup. GTID of the last change is gtid_current_pos variable which is gotten with SHOW VARIABLES statement. (see write_binlog_info()). Both statements are executed under BACKUP STAGE BLOCK_COMMIT lock (if --no-lock option is not used).

I failed to repeat it with mtr test. I have started replication with several slave threads and set watchpoints on variables which are used to count binlog position(mysql_bin_log.log_file.pos_in_file, mysql_bin_log.log_file.write_pos, mysql_bin_log.log_file.request_pos, see also MYSQL_BIN_LOG::raw_get_current_log() - the function which is used to count binlog position on SHOW MASTER STATUS execution). I also set breakpoints to function which is used to update gtid_current_pos(rpl_binlog_state::update_nolock()). Then I set BLOCK_COMMIT lock on slave and executed a bunch of insert's on master. There were not hits after BLOCK_COMMIT on slave.

I am trying to repeat the bug with sysbench.

Comment by Andrea Ponzo (Inactive) [ 2020-09-02 ]

vlad.lesin let's explain this issue better:
we have a Server A ( MariaDB version is not important)
a server B ( Slave of A, mariadb version 10.4.14)
we need to create a server C ( version 10.4.14) as Slave of A performing a mariabackup on B.
during the backup the traffic is open on Master A
At the end on the backup we need to use the file xtrabackup_slave_info to get the CHANGE MASTER in order to connect C as slave of A.

and we have

[root@m2 andre2]# cat xtrabackup_slave_info
SET GLOBAL gtid_slave_pos = '0-1-18691';
CHANGE MASTER TO master_use_gtid = slave_pos

of course we will add the MASTER_HOST of A

doing this we have duplicate errors because the above gtid_slave_pos is wrong.

if we open the file
[root@m2 andre2]# cat xtrabackup_binlog_info
m2-bin.000007 25035029 0-1-18691

binlog file and pos of server B
and calculate its GTID for this position

MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
--------------------------------------------

BINLOG_GTID_POS("m2-bin.000007",25035029 )

--------------------------------------------

0-1-18699

--------------------------------------------

it is different from that reported on xtrabackup_slave_info

So, the MASTER_LOG_FILE and MASTER_LOG_POS are correct if we want to setup the server C as slave of B, but the GTID is not.

the Sequence number difference is due to the number of parallel_workers used
the difference is 1 for parallel workers 0 or 1
and n for slave_parallel = n used in server B.

Can you please report, on your test, the output of :
cat xtrabackup_binlog_info
SELECT BINLOG_GTID_POS(file,pos);

thanks
Andrea

Comment by Vladislav Lesin [ 2020-09-02 ]

My current assumption, which I am trying to repeat in MTR test, is the following:

Consider the following code:

int ha_commit_trans(THD *thd, bool all)                                         
{
...
  if (!is_real_trans)                                                           
  {                                                                             
    error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans);        
    goto done;                                                                  
  }
...
  cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered,            
                                need_commit_ordered);
...
  error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans) ? 2 : 0;  
... 
}

tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired.

So, there can be the following scenario:
T2 - slave worker, have started transaction and executed some Write_rows events;
T1 - connection from mariabackup, holds backup block commit lock;
T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1;
T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.

Comment by Vladislav Lesin [ 2020-09-03 ]

monty, I have just confirmed the above scenario with test case: https://github.com/MariaDB/server/commit/cbcb921bb83be8b84142dfa73583724c8af843a8.

Comment by Vladislav Lesin [ 2020-09-04 ]

I have just added one more test case to reproduce the scenario without replication: https://github.com/MariaDB/server/commit/f19ff5b68e70214c4d75a2165ce8f5ef8cd2739f. The problem is the same - binlog flushing in ha_commit_trans() is not covered by backup lock.

Comment by Michael Widenius [ 2020-09-07 ]

Code to review is in bb-10.4-monty

Comment by Michael Widenius [ 2020-09-25 ]

Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
before log_and_order.

The original problem for MDEV-21953 was that while a thread was waiting in
for another threads to commit in 'log_and_order', it had the
MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
This causes a deadlock as the waited-for thread can never get past the
MDL_BACKUP_COMMIT lock in ha_commit_trans.

The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
a thread is waiting for other 'previous' threads to commit. This ensures
that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
ensures that there are no deadlocks any more.

Generated at Thu Feb 08 09:23:31 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.