Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
10.4.14
-
None
Description
It appears that the fix for https://jira.mariadb.org/browse/MDEV-21953 has introduced a serious regression and now any backup taken with MariaDB server 10.4.14 and above saves the wrong GTID.
To reproduce, take a mariabackup from a 10.4.14 instance, then notice how the reported position does refer to a GTID which is not proper. In the example below, the PIT corresponds to position 25035029, but the reported GTID 0-1-18691 is wrong and if used, yields several duplicate keys errors. In fact the correct GTID for that position would be 0-1-18699 as reported by BINLOG_GTID_POS() function.
This wrong GTID is saved to the xtrabackup_slave_info and since the position is no longer saved there (WHY???) the resulting backup does not carry a valid PIT information anymore.
This is hitting a customer of ours and makes all backups inconsistent due to wrong PIT info, hence the critical priority.
Thanks to aponzo who found and verified this issue.
Workaround: downgrade server to 10.4.13
[00] 2020-08-26 12:25:52 Backup created in directory '/root/backup/andre2/xtrabackup_backupfiles/'
|
[00] 2020-08-26 12:25:52 MySQL binlog position: filename 'm2-bin.000007', position '25035029', GTID of the last change '0-1-18691'
|
[00] 2020-08-26 12:25:52 MySQL slave binlog position: master host '192.168.56.101', gtid_slave_pos 0-1-18691
|
[00] 2020-08-26 12:25:52 Streaming backup-my.cnf
|
[00] 2020-08-26 12:25:52 ...done
|
[00] 2020-08-26 12:25:52 Streaming xtrabackup_info
|
[00] 2020-08-26 12:25:52 ...done
|
[00] 2020-08-26 12:25:52 Redo log (from LSN 29523906 to 31884588) was copied.
|
[00] 2020-08-26 12:25:52 completed OK!
|
prepare
|
2020-08-26 12:26:53 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=30269374
|
2020-08-26 12:26:53 0 [Note] InnoDB: Starting final batch to recover 286 pages from redo log.
|
2020-08-26 12:26:53 0 [Note] InnoDB: Last binlog file './m2-bin.000007', position 25035029
|
[00] 2020-08-26 12:26:54 Last binlog file ./m2-bin.000007, position 25035029
|
[00] 2020-08-26 12:26:54 completed OK!
|
MariaDB [test]> SELECT BINLOG_GTID_POS("m2-bin.000007",25035029 );
|
+--------------------------------------------+
|
| BINLOG_GTID_POS("m2-bin.000007",25035029 ) |
|
+--------------------------------------------+
|
| 0-1-18699 |
|
+--------------------------------------------+
|
1 row in set (0.133 sec)
|
Attachments
Issue Links
- causes
-
MDEV-25334 FTWRL/Backup blocks DDL on temporary tables with binlog enabled, assertion fails in Diagnostics_area::set_error_status
-
- Closed
-
- relates to
-
MDEV-21611 mariabackup should update binlog position in InnoDB during prepare
-
- Open
-
-
MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel replication
-
- Closed
-
My current assumption, which I am trying to repeat in MTR test, is the following:
Consider the following code:
{
...
{
error= commit_one_phase_2(thd, all, trans, is_real_trans, rw_trans);
}
...
cookie= tc_log->log_and_order(thd, xid, all, need_prepare_ordered,
need_commit_ordered);
...
...
}
tc_log->log_and_order() flushes binlog and changes binlog position. commit_one_phase_2() acquires backup block commit lock. As we can see, in the case, when is_real_trans==true, binlog is flushed before block commit is acquired.
So, there can be the following scenario:
T2 - slave worker, have started transaction and executed some Write_rows events;
T1 - connection from mariabackup, holds backup block commit lock;
T2 - executes Xid_log_event in ha_commit_trans(), and as is_real_commit == true for Xid_log_event, it executes tc_log->log_and_order() first, i.e. flushes binlog and changes binlog position, but then it is blocked in commit_one_phase_2() because backup block commit lock is held by T1;
T1 - executes 'SHOW MASTER STATUS' and 'SHOW VARIABLES LIKE "gtid_current_pos"', and receives binlog position, which is greater then gtid_current_pos.