[MDEV-14543] incremental prepare doesn't complain about incorrect lsn Created: 2017-11-30  Updated: 2018-01-08  Resolved: 2018-01-08

Status: Closed
Project: MariaDB Server
Component/s: Backup
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Andrii Nikitin (Inactive) Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 0
Labels: upstream

Sprint: 10.2.12

 Description   

Currently mariabackup 10.2 doesn't complain when user attempts to apply incremental backup to incorrect base backup, (probably corrupting data).
10.1 does verify to_lsn and from_lsn in xtrabackup_checkpoints file, while 10.2 seems to ignore that.

Test below succeeds in 10.1 (meaning that mariabackup correctly shows error), while fails in 10.2

call mtr.add_suppression("InnoDB: New log files created");
 
let $fullbackup_old=$MYSQLTEST_VARDIR/tmp/backup_old;
let $fullbackup_new=$MYSQLTEST_VARDIR/tmp/backup_new;
let $incremental_dir=$MYSQLTEST_VARDIR/tmp/backup_inc1;
 
CREATE TABLE t(i INT PRIMARY KEY) ENGINE INNODB;
INSERT INTO t VALUES(1);
 
echo # Create full backup then incremental;
--disable_result_log
exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$fullbackup_old;
INSERT INTO t VALUES(2);
exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$incremental_dir --incremental-basedir=$fullbackup_old;
 
INSERT INTO t VALUES(3);
 
echo # Create and prepare new backup;
exec $XTRABACKUP --defaults-file=$MYSQLTEST_VARDIR/my.cnf  --backup --target-dir=$fullbackup_new;
exec $XTRABACKUP --prepare --apply-log-only --target-dir=$fullbackup_new;
 
echo # try to apply old incremental to wrong full;
--error 1
exec $XTRABACKUP --prepare --target-dir=$fullbackup_new --incremental-dir=$incremental_dir ;
--enable_result_log
 
DROP TABLE t;
 
# Cleanup
rmdir $fullbackup_new;
rmdir $fullbackup_old;
rmdir $incremental_dir;



 Comments   
Comment by Marko Mäkelä [ 2018-01-08 ]

Both Mariabackup 10.1 and 10.2 would seem to corrupt the files, but not entirely without complaining:

10.1

2018-01-08 11:21:37 7ffff7fba740 InnoDB: Error: page 5 log sequence number 1632972
InnoDB: is in the future! Current system log sequence number 1632703.
InnoDB: Your database may be corrupt or you may have copied the InnoDB
InnoDB: tablespace but not the InnoDB log files. See
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: for more information.

10.2

2018-01-08 11:18:13 140737353858112 [Warning] InnoDB: Are you sure you are using the right ib_logfiles to start up the database? Log sequence number in the ib_logfiles is 1627873, less than the log sequence number in the first system tablespace file header, 1631299.
2018-01-08 11:18:13 140737353858112 [ERROR] InnoDB: Page [page id: space=0, page number=313] log sequence number 1631299 is in the future! Current system log sequence number 1631198.

Comment by Marko Mäkelä [ 2018-01-08 ]

I cannot reproduce the issue with the attached test case. There would only be logically empty .delta files in the backup, consisting only of a 16KiB end-of-file marker page. Also, in each xtrabackup_info file, the innodb_to_lsn would be the same (very strange; there were changes to the table between the backups).

I think that two things could be done:

  1. revise xtrabackup_apply_delta() so that it would refuse to apply changes when the FIL_PAGE_LSN is earlier than what exists in the data file
  2. revise the condition in xtrabackup_prepare_func() so that it would actually kick in:

    	if (xtrabackup_incremental
    	    && metadata_to_lsn != incremental_lsn) {
    		msg("mariabackup: error: This incremental backup seems "
    		    "not to be proper for the target.\n"
    		    "mariabackup:  Check 'to_lsn' of the target and "
    		    "'from_lsn' of the incremental.\n");
    		exit(EXIT_FAILURE);
    	}
    

    We have metadata_to_lsn==incremental_lsn here. If we compared to metadata_last_lsn instead of metadata_to_lsn, the check would fail. It appears that to_lsn is a redo log checkpoint and last_lsn is the latest scanned LSN (the end of the redo log). So, apparently both the incremental backup and the last full backup started from the same log checkpoint LSN.

I believe that in the posted test, it is highly unlikely to get a redo log checkpoint (or a data page flush) between the INSERT statements. The lack of page writes explains why the .delta files are logically empty. Some sleeps could help to reproduce the problem.

All this said, I am not sure if this problem is worth fixing, or if it is possible to detect all mismatches between the full backup and the incremental backup.

Upstream xtrabackup seems to be suffering from the same problem.

Comment by Marko Mäkelä [ 2018-01-08 ]

I would not fix this unless and until a customer complains. I think that long term, we should have a streaming-capable BACKUP statement inside the server and avoid any log file processing altogether.

Generated at Thu Feb 08 08:14:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.